arXiv cs.AI·19 May 2026

Learning How to Cube

Signal

Hype

In three linesA neuro-symbolic post-training framework trains a 4B-parameter model to generate cubing heuristics for SAT via SFT+DPO. The model achieves pass@5=53 on 100 SAT competition benchmarks, matching the best symbolic heuristic and surpassing Claude-Sonnet-4 (50). Data comes from an MCTS pipeline exploring splitting decisions over SAT competition formulas.

Read source

Your take?

Reasoning Reinforcement learning Benchmarks Papers

Summary generated by Claude — human-verified

Learning How to Cube

Other angles on this story