AI RESEARCH

ASymPO: Asymmetric-Scale Policy Optimization for Asynchronous LLM Post-Training Without Behavior Information

arXiv CS.LG

ArXi:2606.03070v1 Announce Type: new Asynchronous reinforcement learning can improve language-model post-