Back to feed
arXiv cs.AI·

Estimating Item Difficulty with Large Language Models as Experts

Signal
72
Hype
18
In three linesStudy evaluating three off-the-shelf LLMs to estimate difficulty of educational items without response data. Across 6 primary-school mathematics domains, Spearman correlations show moderate-to-strong alignment with empirical difficulties. Pairwise comparisons outperform absolute judgements; adding token probabilities and few-shot examples improves results.
Read source
Your take?
Prompt engineeringEvalsBenchmarks

Summary generated by Claude — human-verified