Back to feed
arXiv cs.CL·

CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers

Signal
78
Hype
15
In three linesCoCoReviewBench is a 3,900-paper benchmark (ICLR, NeurIPS) to evaluate AI reviewer systems. It addresses metric bias by using reviewer-author-meta-review discussions as expert annotations. Results show AI reviewers suffer from hallucinations and reasoning models are more effective reviewers.
Read source
Your take?
BenchmarksReasoningEvalsPapers

Summary generated by Claude — human-verified