The famous METR AI time horizons graph contains numerous severe errors [D]
Nathan Witkin (NYU Stern) harshly critiques METR's AI time horizons graph. Errors include: unmeasured human baselines merely estimated, hourly-paid benchmarkers incentivized to work slowly, biased sample toward authors' peers, and failure to account for familiarity advantage (5-18x faster). Witkin concludes the graph contains too many compounding errors to be salvaged.