AI RESEARCH
Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection
arXiv CS.LG
•
ArXi:2605.28631v1 Announce Type: new Reinforcement learning with verifiable rewards (RLVR) can yield large reasoning gains from very few