Why Far Looks Up: Probing Spatial Representation in Vision-Language Models

ArXi:2605.30161v1 Announce Type: new Vision-language models (VLMs) achieve strong performance on spatial reasoning benchmarks, yet it remains unclear whether this reflects structured 3D understanding or reliance on statistical shortcuts in natural images. We