AI RESEARCH

Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression

arXiv CS.AI

ArXi:2605.25085v1 Announce Type: cross We study the rate-distortion limits of online KV cache compression in autoregressive language models, formulating it as sequential Wyner-Zi source coding on the filtration induced by the model, with the next-step query as decoder side information.