arXiv cs.CL·20 May 2026

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Signal

Hype

In three linesLLMEval-Logic is a Chinese logical reasoning benchmark with 246 base items and 190 hard items, verified by Z3 and expert-audited. Evaluation of 14 frontier LLMs: best score 37.5% on hard items, 60.16% on Z3+rubric formalization.

Read source

Your take?

Benchmarks Reasoning Evals

Summary generated by Claude — human-verified

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Other angles on this story