AI RESEARCH

Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

arXiv CS.CV

ArXi:2605.30161v1 Announce Type: new Vision-language models (VLMs) achieve strong performance on spatial reasoning benchmarks, yet it remains unclear whether this reflects structured 3D understanding or reliance on statistical shortcuts in natural images. We