Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

ArXi:2605.26097v1 Announce Type: new Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying d exemplars from prior tasks, which is often impractical. By contrast, language models can sample from their own