Back to feed
arXiv cs.AI·

Position: AI Evaluations Should be Grounded on a Theory of Capability

Signal
72
Hype
15
In three linesPosition paper arguing that AI model evaluations should be grounded in an explicit theory of capability rather than treating scores as direct measurements. Authors empirically demonstrate that reported performance depends strongly on evaluator modeling assumptions and propose an 'Evaluation Card' to document underlying decisions.
Read source
Your take?
EvalsBenchmarks

Summary generated by Claude — human-verified