Inference-Time Diversity in RL-Trained Lean Theorem Provers: A Diagnostic Study
Signal
78
Hype
15
In three linesRL-trained Lean theorem provers suffer mode-collapse at inference: doubling sampling from k=32 to k=64 on miniF2F-test with DeepSeek-Prover-V1.5-RL solves zero additional theorems (42/244). Fixed structural diversity of 15 tactic skeletons recovers +45% relative improvement at k=16 (+12.3±4.2 theorems). Phenomenon is RL-specific and orthogonal to scaling.Read source
Your take?
Summary generated by Claude — human-verified