Learning When Not to Act: Mitigating Tool Abuse in Agentic Reinforcement Learning

ArXi:2606.02132v1 Announce Type: new Agentic reinforcement learning can induce tool abuse, where models overuse external tools even for queries solvable by internal reasoning. Existing approaches mitigate this issue with uniform tool-use penalties or hard limits, which reduce tool frequency but may also suppress useful tool-assisted exploration. We propose EAPO, an Efficient Agentic Policy Optimization framework that learns selective tool use