Rethinking the Role of Temperature in Large Language Model Distillation
Signal
72
Hype
18
In three linesarXiv paper on temperature's role in LLM distillation. Authors show forward KL (FKL) outperforms reverse KL (RKL) at higher temperatures, contrary to prior empirical conclusions that omitted this parameter. Temperature enriches FKL with non-dominant token signals while merely rescaling RKL gradients.Read source
Your take?
Summary generated by Claude — human-verified