AI RESEARCH

Physics-Guided Policy Optimization with Self-Distillation

arXiv CS.LG

ArXi:2606.03620v1 Announce Type: new Self-distilled policy optimization (SDPO) has become a popular paradigm for LLM post-