AI RESEARCH

AgentIR: A Workload-Adaptive Cascade Retrieval Substrate for Long-Term Conversational Memory

arXiv CS.CL

ArXi:2605.25092v1 Announce Type: cross Long-term conversational memory is a retrieval workload classical IR was not built for: the index grows during the query stream, query types shift intra-session, and the latency budget per retrieval is sub-10 ms. Lucene-class engines treat the index as static and the query as stateless, leaving the workload's structure unexploited. AgentIR treats fusion as a per-query decision along two axes: which fusion to apply (BM25, Dense, RRF, or agent-aware RRF), and whether the ~52 ms dense channel is worth running at all.