AI RESEARCH

VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion

arXiv CS.AI

ArXi:2605.30351v1 Announce Type: cross Long-rollout causal video diffusion has converged on a fixed-size sliding-window KV cache, with recent progress innovating within this layout by changing which tokens occupy the window or how their positions are encoded. The per-head KV layout itself, a dominant contributor to streaming memory and latency, has been mostly left unchanged. In this paper, we present the first study of Multi-Head Latent Attention (MLA) in video diffusion.