Back to feed
arXiv cs.LG·

CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning

Signal
78
Hype
15
In three linesCurveRL introduces a distribution-aware prompt reweighting method for Reinforcement Learning with Verified Rewards (RLVR) using quantile coordinate transforms. Weights depend on rank and density of pass rates rather than absolute values, consistently outperforming GRPO and other RLVR baselines across benchmarks.
Read source
Your take?
ReasoningReinforcement learningPapers

Summary generated by Claude — human-verified