AI RESEARCH
Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models
arXiv CS.CV
•
ArXi:2605.28132v1 Announce Type: new Spatial intelligence requires visual representations that capture both semantic objects and geometric structure in the physical world. To this, two major pre-