LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

ArXi:2605.30434v1 Announce Type: cross Real-world data analysis is inherently iterative, yet existing benchmarks mostly evaluate isolated or short interactive tasks, leaving agents' ability to track evolving analytical context over long horizons untested. We