arXiv cs.AI·20 May 2026

Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

Signal

Hype

In three linesarXiv paper introduces Generative-Evaluative Agreement (GEA), a validity criterion measuring whether an LLM's scoring function recovers skill levels its generative function was instructed to produce. On a two-stage adaptive assessment, the model recovers ~70% of intended variance (r=0.698) with systematic positive bias. GEA is strong (r>0.7) for syntactically verifiable skills but near zero for design-level skills.

Read source

Your take?

Evals Reasoning AI safety

Summary generated by Claude — human-verified

Generative-Evaluative Agreement: A Necessary Validity Criterion for LLM-Enabled Adaptive Assessment

Other angles on this story