AI RESEARCH

DEPART: DEcomposing PARiTy across Multilingual LLMs

arXiv CS.AI

ArXi:2605.28163v1 Announce Type: cross Multilingual Large Language Models (mLLMs) leaderboards report per-language accuracy but rarely explain why disparities emerge, leaving systemic biases unattributed and offering practitioners no actionable levers. We first establish that these gaps are systematic rather than artifacts of sampling noise via distribution-free Friedman and Kruskal--Wallis tests, then