MBench: A Comprehensive Benchmark on Memory Capability for Video World Models

ArXi:2606.00793v1 Announce Type: new Recent advancements in video-based world models have nstrated an unprecedented ability to synthesize high-fidelity visual sequences. However, a fundamental gap persists between visually plausible video generation and the functional requirements of a world model, particularly in maintaining a stable and reasonable internal state over extended temporal horizons.