Faithful-MR1: Faithful Multimodal Reasoning via Anchoring and Reinforcing Visual Attention

ArXi:2605.22072v1 Announce Type: new Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising paradigm for advancing complex reasoning in large language models, and recent work extends RLVR to multimodal large language models (MLLMs). This transfer, however, surfaces a faithfulness challenge: faithful perception of task-relevant visual evidence and faithful use of that evidence during reasoning, leading to unsatisfactory gains on multimodal benchmarks.