AI RESEARCH

OISD: On-Policy Internal Self-Distillation of Language Models

arXiv CS.AI

ArXi:2605.29089v1 Announce Type: cross Recent reinforcement learning (RL) post-