AI RESEARCH
Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics
arXiv CS.LG
•
ArXi:2605.29351v1 Announce Type: new We study minimal attention-only transformers under all-token corruption and show they admit a two-stage empirical Bayes interpretation. A single attention step computes a kernel-weighted posterior mean with respect to the empirical distribution defined by the context. Depth refines this distribution through particle dynamics (