arXiv cs.AI·19 May 2026

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Signal

Hype

In three linesScaleLogic, a synthetic logical reasoning framework, demonstrates that RL can teach long-horizon reasoning to LLMs. Training compute follows a power law with proof depth (T ∝ D^γ, R² > 0.99), with exponent γ increasing from 1.04 to 2.60 as logical expressiveness grows. Models trained on more expressive logics transfer better (+10.66 points on downstream benchmarks).

Read source

Your take?

Reinforcement learning Reasoning Benchmarks Papers

Summary generated by Claude — human-verified

Can RL Teach Long-Horizon Reasoning to LLMs? Expressiveness Is Key

Other angles on this story