Your Multimodal Speech Model Says I Have a Face for Radio

ArXi:2605.30472v1 Announce Type: new As large neural models have become better at language tasks, researchers are increasingly building multi- and omnimodal models that handle modalities of data. One example is the expansion of speech recognition models to audio-visual data for noise mitigation and multimodal subtitling. While performance and bias have been studied extensively in the single-modality regime, it is unknown how new modalities affect this, even though they produce biases in humans. We. therefore.