Back to feed
Hugging Face Blog·

NPHardEval Leaderboard: Unveiling the Reasoning Abilities of Large Language Models through Complexity Classes and Dynamic Updates

Signal
75
Hype
25
In three linesHugging Face releases NPHardEval Leaderboard, a benchmark assessing LLM reasoning abilities through NP-hard problems and dynamic updates. The leaderboard ranks models by performance on tasks of increasing complexity.
Read source
Your take?
BenchmarksReasoningEvals

Summary generated by Claude — human-verified