WorldKV: Efficient World Memory with World Retrieval and Compression

ArXi:2605.22718v1 Announce Type: new Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks real-time constraints: memory footprint and attention cost grow linearly with rollout length. Sliding window inference res throughput but discards long-term consistency. We propose WorldKV, a