Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

ArXi:2606.01365v1 Announce Type: new Tool-using multi-agent large language model (LLM) systems spend computation through model tokens, tool calls, retries, and code execution before producing an answer. When a run fails, final-answer evaluation reveals the endpoint but usually not the point at which the trajectory stopped making recoverable progress. This paper