Learning Fine-grained Parameter Sharing via Sparse Tensor Decomposition

ArXi:2411.09816v4 Announce Type: replace Large neural networks achieve state-of-the-art performance on many tasks, yet their sheer size hinders deployment on resource-constrained devices. Among existing compression approaches, cross-layer parameter sharing remains relatively unexplored for transformer models. In this paper, we