Position: AI Evaluations Should be Grounded on a Theory of Capability
Signal
72
Hype
15
In three linesPosition paper arguing that AI model evaluations should be grounded in an explicit theory of capability rather than treating scores as direct measurements. Authors empirically demonstrate that reported performance depends strongly on evaluator modeling assumptions and propose an 'Evaluation Card' to document underlying decisions.Read source
Your take?
Summary generated by Claude — human-verified