CoCoReviewBench: A Completeness- and Correctness-Oriented Benchmark for AI Reviewers
Signal
78
Hype
15
In three linesCoCoReviewBench is a 3,900-paper benchmark (ICLR, NeurIPS) to evaluate AI reviewer systems. It addresses metric bias by using reviewer-author-meta-review discussions as expert annotations. Results show AI reviewers suffer from hallucinations and reasoning models are more effective reviewers.Read source
Your take?
Summary generated by Claude — human-verified