Confidence Calibration in Large Language Models
Signal
72
Hype
18
In three linesPreregistered study shows current LLMs are overconfident: confidence exceeds accuracy on average. A hard-easy effect moderates this bias: overconfidence peaks on difficult tasks, while easy tasks show substantial underconfidence. Introduces LifeEval, a benchmark for evaluating model calibration across difficulty levels.Read source
Your take?
Summary generated by Claude — human-verified