arXiv cs.CL·21 May 2026

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Signal

Hype

In three linesExpert study (45 scientists, 469 hours) evaluating 2,960 criticisms from 82 Nature papers. GPT-5.2 outperforms top human reviewer (60.0% vs 48.2%), but AI shows 16 recurring weaknesses (limited subfield knowledge, poor long-context handling). AI reviewers complement rather than replace humans.

Read source

Your take?

GPT Gemini Claude Evals Papers

Summary generated by Claude — human-verified

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Other angles on this story