AI RESEARCH

Latent Cache Flow: Model-to-Model Communication Without Text

arXiv CS.LG

ArXi:2605.22863v1 Announce Type: new LLM agents today communicate via text, which incurs considerable latency and information loss due to the need to autoregressively decode the sharer model's state and encode at the receiver model. Recent work such as Cache-to-Cache (C2C; Fu, 2026) seeks to exchange KV caches by learning adapters that translate sharer KV matrices to the receiver model. However, the adapters are large and expensive to train, and translate individual tokens, which requires the target context to be identical.