AI RESEARCH

Runtime-Certified Bounded-Error Quantized Attention

arXiv CS.LG

ArXi:2605.20868v1 Announce Type: new KV cache quantization reduces the memory cost of long-context LLM inference, but