Back to feed
arXiv cs.AI·

ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics

Signal
82
Hype
15
In three linesComBench is a benchmark of 100 Olympiad-level combinatorics problems to evaluate LLM mathematical reasoning. It distinguishes analysis-centric problems (rigorous proofs) from construction-centric problems (explicit constructions). Top models reach 65.4% average and 75.3% Best@4. Kimi-K2.6 outperforms GPT-4o on constructions but trails on proof grading.
Read source
Your take?
BenchmarksReasoningEvals

Summary generated by Claude — human-verified