SynAE: A Framework for Measuring the Quality of Synthetic Data for Tool-Calling Agent Evaluations

ArXi:2605.22564v1 Announce Type: new Today, tool-calling agents are commonly evaluated or tested on static datasets of execution traces, including input commands, agent responses, and associated tool calls. However, internal production datasets are often insufficient or unusable for testing; for example, they may contain sensitive or