Reddit r/MachineLearning·25 May 2026

The famous METR AI time horizons graph contains numerous severe errors [D]

Signal

Hype

In three linesNathan Witkin (NYU Stern) harshly critiques METR's AI time horizons graph. Errors include: unmeasured human baselines merely estimated, hourly-paid benchmarkers incentivized to work slowly, biased sample toward authors' peers, and failure to account for familiarity advantage (5-18x faster). Witkin concludes the graph contains too many compounding errors to be salvaged.

Read source

Your take?

Benchmarks Evals AI safety

Summary generated by Claude — human-verified

The famous METR AI time horizons graph contains numerous severe errors [D]

Other angles on this story