AI RESEARCH

Single-Rollout Hidden-State Dynamics for Training-Free RLVR Data Selection

arXiv CS.LG

ArXi:2605.28631v1 Announce Type: new Reinforcement learning with verifiable rewards (RLVR) can yield large reasoning gains from very few