Benchmarking AI for low-resource contexts: Thinking beyond leaderboards

ArXi:2605.28508v1 Announce Type: new Existing AI evaluation practices often fail to capture how systems actually perform in low-resource environments, where operational constraints shape usability as much as model quality. Through a structured analysis of existing benchmark families across speech, chat/RAG, and vision systems, we identify critical gaps between laboratory evaluation practices and real-world deployment conditions in low-resource environments.