Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key
Signal
82
Hype
18
In three linesScaleLogic, a synthetic logical reasoning framework, demonstrates that RL can teach long-horizon reasoning to LLMs. Training compute follows a power law with proof depth (T ∝ D^γ, R² > 0.99), with exponent γ increasing from 1.04 to 2.60 as logical expressiveness grows. Models trained on more expressive logics transfer better (+10.66 points on downstream benchmarks).Read source
Your take?
Summary generated by Claude — human-verified