arXiv cs.AI·19 May 2026

Inference-Time Diversity in RL-Trained Lean Theorem Provers: A Diagnostic Study

Signal

Hype

In three linesRL-trained Lean theorem provers suffer mode-collapse at inference: doubling sampling from k=32 to k=64 on miniF2F-test with DeepSeek-Prover-V1.5-RL solves zero additional theorems (42/244). Fixed structural diversity of 15 tactic skeletons recovers +45% relative improvement at k=16 (+12.3±4.2 theorems). Phenomenon is RL-specific and orthogonal to scaling.

Read source

Your take?

Reasoning Reinforcement learning Benchmarks Papers

Summary generated by Claude — human-verified

Inference-Time Diversity in RL-Trained Lean Theorem Provers: A Diagnostic Study

Other angles on this story