Learning How to Cube
Signal
75
Hype
25
In three linesA neuro-symbolic post-training framework trains a 4B-parameter model to generate cubing heuristics for SAT via SFT+DPO. The model achieves pass@5=53 on 100 SAT competition benchmarks, matching the best symbolic heuristic and surpassing Claude-Sonnet-4 (50). Data comes from an MCTS pipeline exploring splitting decisions over SAT competition formulas.Read source
Your take?
Summary generated by Claude — human-verified