Back to feed
Reddit r/MachineLearning·

The famous METR AI time horizons graph contains numerous severe errors [D]

Signal
75
Hype
45
In three linesNathan Witkin (NYU Stern) harshly critiques METR's AI time horizons graph. Errors include: unmeasured human baselines merely estimated, hourly-paid benchmarkers incentivized to work slowly, biased sample toward authors' peers, and failure to account for familiarity advantage (5-18x faster). Witkin concludes the graph contains too many compounding errors to be salvaged.
Read source
Your take?
BenchmarksEvalsAI safety

Summary generated by Claude — human-verified