AI RESEARCH

Real-time multilingual ASR using rolling buffers and monolingual models [P]

r/MachineLearning

I built a routing-based approach to lightweight real-time multilingual ASR as part of my research at Gladia. The core problem was how multilingual models that accurately handle mid-conversation language switches are often too big for most local hardware and have poor accuracy. So rather than relying on one massive multilingual model, the system routes audio between smaller, specialized monolingual models (~100M parameters each