MuKV: Multi-Grained KV Cache Compression for Long Streaming Video Question-Answering

ArXi:2605.22269v1 Announce Type: new Long streaming video QA remains challenging due to growing visual tokens and limited reasoning length of large language models (LLMs). KV-caching s the Key-Value (KV) of the historical tokens via LLM prefill and enables efficient streaming QA. However, existing methods cache every one or two frames, causing redundant memory usage and losing fine-grained spatial details within frame or temporal contexts across frames.