Back to feed
arXiv cs.AI·

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation

Signal
82
Hype
15
In three linesUnified study of LLM distillation showing SFT, DAgger, offline RL, and OPD decouple two orthogonal axes: prefix source and token-level KL direction. Authors propose KL mixing and entropy-gated length curriculum, improving Pass@k by 5.8 points and reducing average response length by 3x on math reasoning.
Read source
Your take?
Fine-tuningReinforcement learningReasoningPapers

Summary generated by Claude — human-verified