AI RESEARCH

BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization

arXiv CS.AI

ArXi:2606.00079v1 Announce Type: cross Mixture-of-Experts (MoE) large language models reduce per-token computation through sparse expert activation, but their deployment remains memory-intensive because all expert weights must be kept resident in memory. Existing MoE compression methods struggle in the ultra-low-bit regime: pruning irreversibly removes model capacity, while coarse-grained quantization fails to allocate bits according to heterogeneous expert and weight-direction importance. We propose BitsMoE, a spectral-energy-guided bit-allocation framework for MoE LLM quantization.