Back to feed
Reddit r/MachineLearning·

How much of MLE-Bench's gains are the algorithm vs. better models + more search? [R]

Signal
72
Hype
25
In three linesMLE-Bench shows 80% gains over two years, but new research (FML-Bench) reveals little comes from real algorithmic progress. At equal step budget and identical models, the two-year-old AIDE algorithm matches modern agent/evolutionary search systems. FML-Bench unifies code editing agents, step definitions, and val/test splits to benchmark algorithmic efficiency.
Read source
Your take?
BenchmarksAI AgentsEvalsPapers

Summary generated by Claude — human-verified