AI RESEARCH

Leyline: KV Cache Directives for Agentic Inference

arXiv CS.AI

ArXi:2606.01065v1 Announce Type: cross Modern KV cache management assumes the chatbot workload: prompts arrive once and the cache grows append-only, so prefix caching and forward-only eviction are correct by construction. Agentic LLMs break this assumption. Their conversations evolve through policy-driven editing: failed tool calls are retried, stale outputs dropped, trajectories pivoted. Two distinct cache problems result.