AI RESEARCH
ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression
arXiv CS.AI
•
ArXi:2605.29350v1 Announce Type: new Mixture-of-Experts (MoE) language models reduce per-token computation but still require storing and serving all experts, making deployment memory-intensive. Existing post-