AI RESEARCH
GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs
arXiv CS.CL
•
ArXi:2605.31105v1 Announce Type: new Large language models (LLMs) with extended context lengths rely on the key-value (KV) cache to attention over prior tokens. However, maintaining the KV cache incurs substantial memory overhead, motivating KV-cache compression methods that enforce a fixed budget through eviction and merging. Modern eviction methods increasingly adopt span-based retention because preserving contiguous spans is empirically effective and better preserves semantic coherence.