AI RESEARCH
Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning
arXiv CS.AI
•
ArXi:2606.02132v1 Announce Type: new Agentic reinforcement learning can induce tool abuse, where models overuse external tools even for queries solvable by internal reasoning. Existing approaches mitigate this issue with uniform tool-use penalties or hard limits, which reduce tool frequency but may also suppress useful tool-assisted exploration. We propose EAPO, an Efficient Agentic Policy Optimization framework that learns selective tool use