AI RESEARCH
Faster Synchronous On-Policy RL via Straggler-Aware Group Sizing
arXiv CS.AI
•
ArXi:2606.02218v1 Announce Type: cross Synchronous reinforcement learning methods such as Group Relative Policy Optimization (GRPO) provide stable and reproducible on-policy