What-If World: A Causal Benchmark for General World Models in Embodied Scenarios

ArXi:2605.27589v1 Announce Type: new Video generation models are increasingly used as world simulators for tasks like driving and robotic manipulation. What matters in these settings is not whether a single video looks right, but whether the model's output changes when its input changes. We test this by giving a model two prompts describing the same scene with one physical detail varied, and checking whether the two videos diverge the way physics predicts.