MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

ArXi:2505.23764v3 Announce Type: replace-cross Spatial intelligence is essential for multimodal large language models (MLLMs) operating in the complex physical world. Existing benchmarks, however, probe only single-image relations and thus fail to assess the multi-image spatial reasoning that real-world deployments demand. We