AI RESEARCH

BranPO: Scalable Contrastive Branch Sampling for Long-Horizon Agentic Reinforcement Learning

arXiv CS.CL

ArXi:2602.03719v2 Announce Type: replace Agentic reinforcement learning enables large language models to perform multi-turn planning and tool use, but long-horizon