Back to feed
arXiv cs.AI·

Agents' Last Exam

Signal
82
Hype
25
In three linesAgents' Last Exam (ALE) is a benchmark evaluating AI agents on long-horizon, economically valuable real-world tasks. Developed with 250+ industry experts, it covers 1K+ tasks across 13 industry clusters in non-physical sectors. Average full pass rate is 2.6% on the hardest tier.
Read source
Your take?
AI AgentsBenchmarksEvals

Summary generated by Claude — human-verified