AI RESEARCH
AvalancheBench: Evaluating Enterprise Data Agents Through Latent World Recovery
arXiv CS.AI
•
We introduce AvalancheBench, a benchmark for evaluating enterprise data agents through \emph{latent world recovery}. AvalancheBench improves on existing benchmarks in three ways. Second, it provides ground truth for goal-