Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors

ArXi:2605.27967v1 Announce Type: cross Knowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teachers), including large language models. However, its underlying statistical mechanisms remain unclear, and uncertainty evaluation is often overlooked, especially in real-world scenarios requiring diverse teacher expertise. To address these challenges, we