AI RESEARCH
Spend Your Rollouts Where It Counts: Rollout Allocation for Group-Based RL Post-Training
arXiv CS.AI
•
ArXi:2605.26606v1 Announce Type: cross Reinforcement learning (RL) is the dominant paradigm for post-