On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists
Signal
78
Hype
25
In three linesExpert study (45 scientists, 469 hours) evaluating 2,960 criticisms from 82 Nature papers. GPT-5.2 outperforms top human reviewer (60.0% vs 48.2%), but AI shows 16 recurring weaknesses (limited subfield knowledge, poor long-context handling). AI reviewers complement rather than replace humans.Read source
Your take?
Summary generated by Claude — human-verified