AI RESEARCH

Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics

arXiv CS.LG

ArXi:2605.20441v1 Announce Type: new Transformers trained on modular arithmetic exhibit sharp transitions between memorization, generalization, and collapse. We show that weight decay acts as a scalar empirical control parameter for these regimes, and