RoboSurg-VQA: A Multimodal Benchmark for Surgical Segmentation-Aware Visual Question Answering

ArXi:2605.23068v1 Announce Type: new Reliable visual understanding in robot-assisted and minimally invasive surgery (RMIS/MIS) demands than accurate masks: in clinical practice, clinicians pose language-like questions about procedural context, visibility, artefacts, and the presence of anatomical structures and surgical instruments, often under degraded views caused by occlusion, smoke, bleeding, and specular highlights.