AI RESEARCH
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
arXiv CS.LG
•
ArXi:2605.20441v1 Announce Type: new Transformers trained on modular arithmetic exhibit sharp transitions between memorization, generalization, and collapse. We show that weight decay acts as a scalar empirical control parameter for these regimes, and