ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs

ArXi:2602.07721v3 Announce Type: replace KV-cache retrieval is essential for long-context LLM inference, yet existing methods struggle with distribution drift and high latency at scale. We