AI RESEARCH

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

arXiv CS.AI

ArXi:2605.30789v1 Announce Type: cross We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting token-level randomness, which may