Parallel Context Compaction for Long-Horizon LLM Agent Serving

ArXi:2605.23296v1 Announce Type: new Long-horizon LLM agents accumulate growing conversation histories that eventually exceed the model's context window. Context compaction via LLM-based summarization keeps the conversation bounded, but summarization is inherently lossy and the blocking call stalls agent inference for tens of seconds.