AI RESEARCH
Polynomial Context-Truncation Sensitivity in Autoregressive Language Models: Sequential Wyner-Ziv Bounds for KV Cache Compression
arXiv CS.AI
•
ArXi:2605.25085v1 Announce Type: cross We study the rate-distortion limits of online KV cache compression in autoregressive language models, formulating it as sequential Wyner-Zi source coding on the filtration induced by the model, with the next-step query as decoder side information.