Pruning and Distilling Mixture-of-Experts into Dense Language Models

ArXi:2605.28207v1 Announce Type: cross Mixture-of-Experts (MoE) is now the dominant architecture for frontier language models, yet it requires all expert parameters to be loaded in memory, making it less preferable for memory-constrained deployment. Existing compression methods reduce the number of experts but the output remains an MoE model with the same fundamental limitation.