SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning

ArXi:2602.01990v2 Announce Type: replace-cross Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually expand their capabilities, making Multimodal Continual Instruction Tuning (MCIT) essential. Recent methods leverage sparse expert routing to promote task specialization, but we find that the expert routing process suffers from drift as the data distribution evolves.