Understanding the Effects of Distractors on Reasoning Vision-Language Models

ArXi:2511.21397v2 Announce Type: replace-cross How does irrelevant information (i.e., distractors) affect test-time scaling in vision-language models (VLMs)? Prior work on text-only language models has shown that textual distractors can intensify inverse scaling, causing models to reason longer but less effective reasoning traces. In this work, we investigate whether similar phenomena arise in multimodal settings. We