Building Arabic NLP from the Ground Up: Twenty Years of Lessons, Failures, and Open Problems

ArXi:2605.20786v1 Announce Type: new This paper reflects on twenty years of building NLP resources and research infrastructure for Arabic, a language spoken by hundreds of millions yet historically underserved relative to languages such as English or Chinese. The first decade focused on foundational linguistic infrastructure; the second shifted toward computational social science, social media analysis, and socially oriented applications. Rather than cataloguing outputs, the paper examines what the experience of building them revealed.