OpenAI Blog·13 June 2022

AI-written critiques help humans notice flaws

Signal

Hype

In three linesOpenAI trains critique-writing models to describe flaws in summaries. Human evaluators detect significantly more flaws when shown AI-generated critiques. Larger models excel at self-critique, with scale improvements greater for critique than summary generation.

Read source

Your take?

OpenAI Evals Alignment Reasoning

Summary generated by Claude — human-verified

AI-written critiques help humans notice flaws

Other angles on this story