On-Policy Replay for Continual Supervised Fine-Tuning

ArXi:2605.29495v1 Announce Type: new Continual supervised fine-tuning (SFT) is the de facto recipe for adapting large language models (LLMs) to a stream of downstream tasks, but it suffers from catastrophic forgetting of earlier capabilities. Recent work shows that on-policy signals