Back to feed
arXiv cs.AI·

Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training

Signal
78
Hype
25
In three linesLearning-Zone Energy (LZE) is an online data selection framework for RL post-training of LLMs. Tested on Qwen 1.5B-8B across GSM8K and MATH, it retains 40% of training data per step while matching full-data baselines, with OOD gains of +45.9% on AIME25 and 36% FLOP reduction.
Read source
Your take?
Reinforcement learningReasoningBenchmarksQwenCode generation

Summary generated by Claude — human-verified