AI RESEARCH

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

arXiv CS.AI

ArXi:2605.30557v1 Announce Type: cross Spatial reasoning is a fundamental capability for vision-language models (VLMs) deployed in real-world environments. However, visual observations are inherently limited representations of a 3D world: occlusion can render objects invisible, and perspective can make geometric properties misleading.