X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation
Signal
78
Hype
15
In three linesX-Token introduces cross-tokenizer knowledge distillation via two complementary loss formulations (P-KL and H-KL) using a projection matrix W. On Llama-3.2-1B, the method outperforms GOLD by +3.82 points with Qwen3-4B and +0.5 with Phi-4-Mini; two-teacher setup (Phi-4-mini + Llama-3B) gains +1.3 points.Read source
Your take?
Summary generated by Claude — human-verified