AI RESEARCH

Weight Decay Improves Language Model Plasticity

arXiv CS.AI

ArXi:2602.11137v2 Announce Type: replace-cross Large language models are typically trained in two broad phases: pre