AI RESEARCH

Filter, Then Reweight: Rethinking Optimization Granularity in On-Policy Distillation

arXiv CS.LG

ArXi:2606.02684v1 Announce Type: new On-Policy distillation (OPD) in large language models is shifting from full-trace KL supervision toward selective