Estimating Item Difficulty with Large Language Models as Experts
Signal
72
Hype
18
In three linesStudy evaluating three off-the-shelf LLMs to estimate difficulty of educational items without response data. Across 6 primary-school mathematics domains, Spearman correlations show moderate-to-strong alignment with empirical difficulties. Pairwise comparisons outperform absolute judgements; adding token probabilities and few-shot examples improves results.Read source
Your take?
Summary generated by Claude — human-verified