AI RESEARCH
DDX-TRACE: A Benchmark for Medical Diagnostic Trajectories in VLMs
arXiv CS.CV
•
ArXi:2605.23629v1 Announce Type: new Medical diagnosis is not a single prediction from a fully specified vignette. It is a sequential workup: clinicians decide what evidence to obtain, revise a differential diagnosis, and stop when the diagnosis is sufficiently ed. Most medical AI benchmarks instead reveal the relevant context upfront and score only the final answer, making uned correct guesses, premature closure, inefficient workups, and poor uncertainty updating invisible. We