Back to feed
arXiv cs.CL·

Unified Data Selection for LLM Reasoning

Signal
72
Hype
25
In three linesHES (High-Entropy Sum) is a training-free metric for selecting high-quality reasoning data in LLMs. Tested across SFT, RFT, and RL paradigms, it achieves full-dataset performance using only the top 20% of samples, significantly reducing computational overhead.
Read source
Your take?
ReasoningFine-tuningReinforcement learningPapers

Summary generated by Claude — human-verified