Beyond Text Following: Repairable Arbitration Reversals in Audio-Language Models

ArXi:2606.05161v1 Announce Type: cross Audio-language models (ALMs) often follow text that conflicts with audio, even when the audio evidence is clear. This raises a basic question: is the audio-ed answer unavailable, or is it represented but overridden by the conflicting text? We examine this question using a same-audio counterfactual that keeps the audio fixed, removes only the conflicting text, and measures the resulting shift in model preference.