AI RESEARCH
Value-Aware Stochastic KV Cache Eviction for Reasoning Models
arXiv CS.LG
•
ArXi:2606.03928v1 Announce Type: new Reasoning models improve accuracy through extended chains of thought, but their long outputs create a memory and compute bottleneck. KV cache eviction methods reduce this cost by evicting unimportant key-value pairs from the cache, yet they often yield worse accuracy than selection-based sparse attention alternatives, which keep the full KV cache. We identify key factors crucial to KV cache eviction accuracy.