VIHD: Visual Intervention-based Hallucination Detection for Medical Visual Question Answering

ArXi:2605.20772v1 Announce Type: new While medical Multimodal Large Language Models (MLLMs) have shown promise in assisting diagnosis, they still frequently generate hallucinated responses that appear linguistically plausible but lack visual evidence. Such hallucinations pose risks to clinical decision-making and necessitate effective detection. Existing