Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

ArXi:2605.28769v1 Announce Type: new Softmax attention is the cornerstone of modern large language models, but its memory scales linearly and compute quadratically with sequence length. Linear recurrent models, such as linear attention and state space models, have become widely studied as alternatives to attention due to their linear compute and constant memory.