Subagents Account for Most Token Costs in Long Agent Runs: Fixes That Cut Usage 70 to 90 Percent in Practice

Running multi-turn or multi-agent AI sessions? There is a consistent degradation pattern across tools: context fills with repeated history, tool schemas, and subagent handoffs. A 2026 paper by Bai studying SWE-bench across eight frontier models found agentic coding tasks consume roughly 1000x tokens than ordinary chat, with 30x variance on identical tasks. Accuracy does not rise with spend. In one tracked research synthesis run I observed context hit 450,000 tokens. The agent dropped early constraints, re-queried sources already in history, and required manual reset.