AI RESEARCH
Weight Decay Improves Language Model Plasticity
arXiv CS.AI
•
ArXi:2602.11137v2 Announce Type: replace-cross Large language models are typically trained in two broad phases: pre