MVISTA-4D: View-Consistent 4D World Model with Test-Time Action Inference for Robotic Manipulation

ArXi:2602.09878v2 Announce Type: replace World-model-based imagine-then-act becomes a promising paradigm for robotic manipulation, yet existing approaches typically either purely image-based forecasting or reasoning over partial 3D geometry, limiting their ability to predict complete 4D scene dynamics.