When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning
Signal
78
Hype
15
In three linesAuthors show teacher-token reliability in reasoning self-distillation depends on position within trajectory, not local entropy. They propose Position-Weighted OPSD (PW-OPSD), applying increasing position weights to token supervision. On Qwen3-4B, AIME 2024/2025 improve by +1.0/+1.1 points; validation on DeepSeek-R1-Distill-Llama-8B and Olmo-3-7B-Think confirms gains.Read source
Your take?
Summary generated by Claude — human-verified