Back to feed
arXiv cs.AI·

JobBench: Aligning Agent Work With Human Will

Signal
78
Hype
25
In three linesJobBench evaluates 36 AI models (including Claude Opus at 45.9%) on 130 real professional tasks across 35 occupations. Unlike existing benchmarks focused on economic value, JobBench prioritizes workflows experts identify as high-priority for delegation, favoring human augmentation over replacement.
Read source
Your take?
AI AgentsBenchmarksClaudeEvals

Summary generated by Claude — human-verified