AI RESEARCH

What Do Students Learn? A Feature-Level Analysis of Dark Knowledge

arXiv CS.LG

ArXi:2606.03052v1 Announce Type: new Knowledge Distillation (KD) is a powerful tool for model compression, yet the precise mechanisms by which student models acquire feature representations remain underexplored. In this work, we analyze student feature learning using the Interaction Tensor framework. Our analysis reveals that effective KD acts as a regularizer that prunes low-frequency, sample-specific features, encouraging the student to rely on a compact set of highly reusable features.