AI RESEARCH

Are Full Rollouts Necessary for On-Policy Distillation?

arXiv CS.CL

ArXi:2605.31490v1 Announce Type: new On-policy distillation (OPD) provides dense teacher feedback along rollouts generated by the student and has emerged as a promising post-