AI RESEARCH
Runtime-Certified Bounded-Error Quantized Attention
arXiv CS.LG
•
ArXi:2605.20868v1 Announce Type: new KV cache quantization reduces the memory cost of long-context LLM inference, but