Back to feed
arXiv cs.AI·

Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

Signal
72
Hype
18
In three linesarXiv paper introduces Generative-Evaluative Agreement (GEA), a validity criterion measuring whether an LLM's scoring function recovers skill levels its generative function was instructed to produce. On a two-stage adaptive assessment, the model recovers ~70% of intended variance (r=0.698) with systematic positive bias. GEA is strong (r>0.7) for syntactically verifiable skills but near zero for design-level skills.
Read source
Your take?
EvalsReasoningAI safety

Summary generated by Claude — human-verified