AI RESEARCH
OISD: On-Policy Internal Self-Distillation of Language Models
arXiv CS.AI
•
ArXi:2605.29089v1 Announce Type: cross Recent reinforcement learning (RL) post-