MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

ArXi:2605.20729v1 Announce Type: new Accurate evaluation of conversational retrieval is pivotal for advancing Retrieval-Augmented Generation (RAG) systems. However, existing conversational retrieval benchmarks suffer from costly, sparse human annotation or rigid, unnatural automated heuristics. To address these challenges, we