When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

ArXi:2602.08236v2 Announce Type: replace-cross Despite rapid progress in MLLMs, visual spatial reasoning remains unreliable when correct answers depend on how a scene would appear under unseen or alternative viewpoints. Recent work addresses this by augmenting reasoning with world models for visual imagination, but questions such as when imagination is actually necessary, how much of it is beneficial, and when it becomes harmful, remain poorly understood. In practice, indiscriminate imagination can increase computation and even degrade performance by.