AI RESEARCH

Grounded Cache Routing for Retrieval-Augmented Generation: When Is It Safe to Reuse an Answer?

arXiv CS.AI

ArXi:2605.27494v1 Announce Type: cross Modern retrieval-augmented generation(RAG) deployments increasingly rely on caching to reduce token cost and time-to-first-token(TTFT). Prefix-level KV reuse is now standard in serving stacks such as vLLM, and chunk-level and position-independent reuse have been pushed further by recent systems(RAGCache, TurboRAG, CacheBlend, EPIC, ContextPilot, PCR, LMCache