The Distillation Game: Adaptive Attacks & Efficient Defenses

ArXi:2605.22737v1 Announce Type: cross Distillation attacks create a deployment trade-off for model providers: the same outputs that make a model useful can also make it easier to imitate. We study this trade-off through a minimax game between a utility-constrained teacher and an adaptive student. Our framework yields tractable one-sided response rules: an adaptive evaluation rule in which the student reweights high-value examples, and a teacher-side defense template that suppresses outputs most useful for distillation.