Back to feed
OpenAI Blog·

Why we no longer evaluate SWE-bench Verified

Signal
75
Hype
25
In three linesOpenAI stops evaluating on SWE-bench Verified, citing contamination and poor measurement of frontier coding progress. Analysis reveals flawed tests and training leakage. OpenAI recommends SWE-bench Pro instead.
Read source
Your take?
OpenAIBenchmarksCode generationEvals

Summary generated by Claude — human-verified