CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning
Signal
78
Hype
15
In three linesCurveRL introduces a distribution-aware prompt reweighting method for Reinforcement Learning with Verified Rewards (RLVR) using quantile coordinate transforms. Weights depend on rank and density of pass rates rather than absolute values, consistently outperforming GRPO and other RLVR baselines across benchmarks.Read source
Your take?
Summary generated by Claude — human-verified