AI RESEARCH
BranPO: Scalable Contrastive Branch Sampling for Long-Horizon Agentic Reinforcement Learning
arXiv CS.CL
•
ArXi:2602.03719v2 Announce Type: replace Agentic reinforcement learning enables large language models to perform multi-turn planning and tool use, but long-horizon