arXiv cs.AI·19 May 2026

Position: AI Evaluations Should be Grounded on a Theory of Capability

Signal

Hype

In three linesPosition paper arguing that AI model evaluations should be grounded in an explicit theory of capability rather than treating scores as direct measurements. Authors empirically demonstrate that reported performance depends strongly on evaluator modeling assumptions and propose an 'Evaluation Card' to document underlying decisions.

Read source

Your take?

Evals Benchmarks

Summary generated by Claude — human-verified

Position: AI Evaluations Should be Grounded on a Theory of Capability

Other angles on this story