AI RESEARCH
ASymPO: Asymmetric-Scale Policy Optimization for Asynchronous LLM Post-Training Without Behavior Information
arXiv CS.LG
•
ArXi:2606.03070v1 Announce Type: new Asynchronous reinforcement learning can improve language-model post-