AI RESEARCH

Multi-Mixer Models: Flexible Sequence Modeling with Shared Representations

arXiv CS.LG

ArXi:2605.28769v1 Announce Type: new Softmax attention is the cornerstone of modern large language models, but its memory scales linearly and compute quadratically with sequence length. Linear recurrent models, such as linear attention and state space models, have become widely studied as alternatives to attention due to their linear compute and constant memory.