DMC-CF: Dynamic Multimodal CounterFactual QA benchmark for Causal Reasoning

ArXi:2605.29339v1 Announce Type: new With the rapid advancement of multimodal large language models (MLLMs), models have nstrated increasingly powerful multimodal capabilities. However, whether MLLMs trained through statistical learning can truly understand the causal relationships underlying the real world remains a key research question. In recent years, numerous multimodal causal reasoning datasets have been proposed. Nevertheless, these datasets are either limited in scale or constructed from synthetic images and videos, cartoon-based content, or other non-realistic multimodal sources.