AI RESEARCH

Runtime-Certified Bounded-Error Quantized Attention

arXiv CS.LG • May 21, 2026

ArXi:2605.20868v1 Announce Type: new KV cache quantization reduces the memory cost of long-context LLM inference, but