AI RESEARCH
XRPO: Pushing the limits of GRPO with Targeted Exploration and Exploitation
arXiv CS.LG
•
ArXi:2510.06672v3 Announce Type: replace Reinforcement learning algorithms such as GRPO have driven recent advances in large language model (LLM) reasoning. While scaling the number of rollouts stabilizes