xKV: Cross-Layer KV-Cache Compression via Aligned Singular Vector Extraction

ArXi:2503.18893v2 Announce Type: replace-cross Long-context Large Language Models (LLMs) enable powerful applications but incur high memory costs due to the key-value states (KV-Cache). Recent studies attempt to share KV-Cache across layers, but these approaches either require expensive pre