AI RESEARCH
Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation
arXiv CS.LG
•
ArXi:2606.02684v1 Announce Type: new On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward selective