AI RESEARCH

ObjectCache: Layerwise Object-Storage Retrieval for KV Cache Reuse

arXiv CS.AI

ArXi:2605.22850v1 Announce Type: cross Prefix KV caching has become a key mechanism in LLM serving: it reduces time to first token (TTFT) by avoiding redundant computation across requests that share a prefix (i.e., the system prompt). However, the accumulated KV cache is often larger than what GPU memory and local DRAM can hold. To preserve latency, current systems keep the KV cache in remote DRAM pools, increasing serving-cluster size and cost.