OISD: On-Policy Internal Self-Distillation of Language Models
Signal
78
Hype
15
In three linesOISD introduces on-policy internal self-distillation to improve language model reasoning. The final layer acts as a detached teacher for intermediate layers via logit alignment (reasoning behaviors) and attention alignment (attention patterns), without external privileged information. Positive results across four mathematical reasoning tasks.Read source
Your take?
Summary generated by Claude — human-verified