Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

ArXi:2606.01247v1 Announce Type: new Humans can reproduce the viewpoint specified by a target image through active head and body motion, yet spatial intelligence in foundation models has largely been studied as passive understanding of pre-collected observations. We