Reddit r/MachineLearning·1 June 2026

How much of MLE-Bench's gains are the algorithm vs. better models + more search? [R]

Signal

Hype

In three linesMLE-Bench shows 80% gains over two years, but new research (FML-Bench) reveals little comes from real algorithmic progress. At equal step budget and identical models, the two-year-old AIDE algorithm matches modern agent/evolutionary search systems. FML-Bench unifies code editing agents, step definitions, and val/test splits to benchmark algorithmic efficiency.

Read source

Your take?

Benchmarks AI Agents Evals Papers

Summary generated by Claude — human-verified

How much of MLE-Bench's gains are the algorithm vs. better models + more search? [R]

Other angles on this story