AI RESEARCH

When Should the Teacher Move? Temporal Coupling and Stability in Self On-Policy Distillation

arXiv CS.LG

ArXi:2606.03532v1 Announce Type: new Self on-policy distillation trains a student policy against a teacher derived from its own parameter history, yet the teacher's update schedule -- which governs the \emph{temporal coupling} between teacher and student -- has not been systematically studied as a stability variable. Through a controlled schedule sweep on Qwen3-8B, we establish that \emph{isolation periods}, defined as complete teacher freezing between updates, are the key structural property enabling stable learning, not teacher age. To characterize these underlying.