AI RESEARCH

Probing the Prompt KV Cache: Where It Becomes Dispensable

arXiv CS.CL

ArXi:2605.30574v1 Announce Type: new Prior KV cache compression schemes empirically nstrate that the prompt cache is partially redundant during decoding, dropping or summarising entries with little accuracy loss. We ask when and what kind of redundancy: at which layers, after how many decoding steps, and in what form can the prompt span KV cache be replaced without breaking the task. A controlled splice intervention swept over layer cutoff and decoding steps shows this redundancy is about form (chat template scaffolding) rather than content.