AI RESEARCH

No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

arXiv CS.AI

ArXi:2605.30120v1 Announce Type: cross Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immense memory footprint and computational overhead of billion-scale token vectors, state-of-the-art systems are forced to rely on aggressive dimension reduction and complex clustering (e.g., K-means). This compromise