AI RESEARCH

Stateful Inference for Low-Latency Multi-Agent Tool Calling

arXiv CS.LG

ArXi:2605.26289v1 Announce Type: new Multi-agent tool calling is becoming the dominant interaction pattern for LLM-based systems, yet existing inference frameworks treat each tool call as an independent request, re-processing the entire conversation from scratch even though 85-95% of the prompt is unchanged from the previous turn.