Back to feed
arXiv cs.AI·

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Signal
75
Hype
25
In three linesEmpirical study on LLM-generated reviews for scientific papers (ACL Rolling Review 2025 data). Findings: limited alignment between LLM and human reviews, substantial variation across prompts and models. Authors can 'game' LLM reviews through iterative revision workflows, increasing scores for up to 35% of tested papers.
Read source
Your take?
EvalsBenchmarksAlignmentAI safety

Summary generated by Claude — human-verified