AI RESEARCH
Seeing vs. Believing: Evaluating the Language Bias of Open-Source MLLMs in Counter-Intuitive Scenes
arXiv CS.AI
•
ArXi:2601.07737v2 Announce Type: replace-cross Multimodal Large Language Models (MLLMs) have nstrated remarkable performance in mainstream visual understanding tasks, but their ability to process action scenes that contradict everyday common sense remains undertested. To address this gap, we