AI RESEARCH

Attention Projection Mixing with Exogenous Anchors

arXiv CS.CL

ArXi:2601.08131v4 Announce Type: replace Cross-layer reuse of early attention projections can improve optimization and data efficiency, but it creates a structural conflict: the first layer must simultaneously act as a stable, reusable anchor for all deeper layers and as an effective computational block. We nstrate that this tension constrains the performance of internal-anchor designs. We propose ExoFormer, which resolves the conflict by learning exogenous anchor projections outside the sequential layer stack. We.