AI RESEARCH
Physics-Guided Policy Optimization with Self-Distillation
arXiv CS.LG
•
ArXi:2606.03620v1 Announce Type: new Self-distilled policy optimization (SDPO) has become a popular paradigm for LLM post-