AI RESEARCH

Distillation of Large Language Models via Concrete Score Matching

arXiv CS.AI

ArXi:2509.25837v3 Announce Type: replace-cross Large language models (LLMs) deliver remarkable performance but are costly to deploy, motivating knowledge distillation (KD) for efficient inference. Existing KD objectives typically match student and teacher probabilities via softmax, which blurs valuable logit information. While direct logit distillation (DLD) mitigates softmax smoothing, it fails to account for logit shift invariance, thereby restricting the solution space.