Entity Binding Failures in Speech LLM Reasoning: Diagnosis and Chain-of-Thought Intervention

ArXi:2606.04474v1 Announce Type: new Speech Large Language Models (SLLMs) underperform their text counterparts on complex reasoning. We reveal that this modality gap is not a uniform cognitive deficit. Evaluating three diverse SLLMs, we show speech-to-text (S2T) matches or exceeds text-to-text (T2T) on spatial, syntactic, and factual tasks. However, on logical tasks requiring entity tracking, S2T accuracy collapses to chance.