Bandwidth-Efficient and Privacy-Preserving Edge-Cloud Many-to-Many Speech Translation

ArXi:2605.28642v1 Announce Type: new Multimodal large language models (MLLMs) have nstrated significant potential for speech-to-text translation (S2TT). However, existing deployment paradigms face critical challenges: pure on-device models suffer from resource constraints, while centralized cloud systems incur severe privacy risks and bandwidth bottlenecks by transmitting raw voice data. Furthermore, most models exhibit English-centric biases, restricting many-to-many translation scaling.