AI RESEARCH
Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling
arXiv CS.AI
•
ArXi:2605.21557v1 Announce Type: cross Conventional wisdom holds that large-batch