AI RESEARCH
Augmenting Attention with Exponentially Decaying Memory Improves Query-Aware KV Sparsity
arXiv CS.LG
•
ArXi:2605.28640v1 Announce Type: new Efficient inference is critical for long-context language models, where attention computation and KV-cache access dominate the cost. Recent work