AI RESEARCH

Why Your Deep Research Agent Fails? On Hallucination Evaluation in Full Research Trajectory

arXiv CS.AI

ArXi:2601.22984v2 Announce Type: replace Diagnosing failure patterns in Deep Research Agents (DRAs) remains a critical challenge. Existing benchmarks predominantly rely on end-to-end evaluation, obscuring intermediate hallucinations that accumulate throughout the research trajectory. To bridge this gap, we propose a shift from outcome-based to processaware evaluation by auditing hallucinations in the full plan-search-summarize trajectory. We