Meta-Attention: Bayesian Per-Token Routing for Efficient Transformer Inference

ArXi:2605.28384v1 Announce Type: new Standard transformer architectures apply a single attention mechanism uniformly across all tokens and sequence positions, irrespective of local context or computational budget. We propose Meta-Attention, a framework that dynamically routes each token to the most appropriate attention strategy -- full softmax attention, linear (kernel) attention, or sliding-window local attention -- via a Bayesian Meta-Controller.