AI RESEARCH

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

arXiv CS.CV • May 28, 2026

ArXi:2605.28132v1 Announce Type: new Spatial intelligence requires visual representations that capture both semantic objects and geometric structure in the physical world. To this, two major pre-

Read Full Article