Back to feed
arXiv cs.CL·

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

Signal
78
Hype
25
In three linesExpert study (45 scientists, 469 hours) evaluating 2,960 criticisms from 82 Nature papers. GPT-5.2 outperforms top human reviewer (60.0% vs 48.2%), but AI shows 16 recurring weaknesses (limited subfield knowledge, poor long-context handling). AI reviewers complement rather than replace humans.
Read source
Your take?
GPTGeminiClaudeEvalsPapers

Summary generated by Claude — human-verified