Back to feed
arXiv cs.CL·

D$^2$Evo: Dual Difficulty-Aware Self-Evolution for Data-Efficient Reinforcement Learning

Signal
75
Hype
25
In three linesD²Evo is an RL framework to enhance LLM reasoning. It addresses scarcity of medium-difficulty samples by mining anchors matched to model capability and training a Questioner to generate diverse questions at appropriate difficulty. Results: outperforms existing methods on math benchmarks with <2K real samples.
Read source
Your take?
Reinforcement learningReasoningBenchmarks

Summary generated by Claude — human-verified