AI RESEARCH

DOT-MoE: Differentiable Optimal Transport for MoEfication

arXiv CS.AI

ArXi:2606.01666v1 Announce Type: cross The scaling of Large Language Models (LLMs) has driven significant performance gains but created substantial challenges in inference efficiency. While Mixture of Experts (MoEs) architectures address this by decoupling model size from inference cost