Connecting the Dots: Benchmarking Reflective Memory in Long-Horizon Dialogue

ArXi:2606.01223v1 Announce Type: cross Despite substantial progress in long-context modeling, existing benchmarks remain confined to factual memory for explicit recall, failing to measure the reflective memory required to synthesize fragmented, multimodal cues into high-level interpretations. To address this gap, we