AI RESEARCH
X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation
arXiv CS.CL
•
ArXi:2605.21699v1 Announce Type: cross Cross-tokenizer knowledge distillation allows a student model to learn from teachers with incompatible vocabularies. Prior work operates on hidden states or logits; the latter is preferred as a drop-in replacement requiring no auxiliary components. Logit-based methods either use only the correct-token probability, missing the full 'dark knowledge' in the teacher's distribution, or operate on the full output distribution, relying on strict token partitioning and/or unprincipled heuristic ranking.