Routing-Aligned Fine-Tuning for Multilingual Downstream Tasks in Mixture-of-Experts Models

ArXi:2605.28306v1 Announce Type: cross Mixture-of-Experts (MoE) models have emerged as a dominant paradigm for efficient LLM scaling, yet adapting them to non-English downstream tasks remains challenging. Existing fine-tuning approaches treat MoE models as monolithic learners, ignoring the heterogeneous routing structure that develops during pre