OpenAI Blog·25 September 2025

Measuring the performance of our models on real-world tasks

Signal

Hype

In three linesOpenAI introduces GDPval, a new evaluation framework measuring model performance on economically valuable real-world tasks across 44 occupations. The benchmark assesses practical capabilities on professional use cases rather than traditional academic benchmarks.

Read source

Your take?

OpenAI Benchmarks Evals

Summary generated by Claude — human-verified

Measuring the performance of our models on real-world tasks

Other angles on this story