Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
Signal
78
Hype
25
In three linesLearning-Zone Energy (LZE) is an online data selection framework for RL post-training of LLMs. Tested on Qwen 1.5B-8B across GSM8K and MATH, it retains 40% of training data per step while matching full-data baselines, with OOD gains of +45.9% on AIME25 and 36% FLOP reduction.Read source
Your take?
Summary generated by Claude — human-verified