Back to feed
arXiv cs.CL·

Augmenting Human Evaluation with LLM Judges: How Many Human Reviews Do You Need?

Signal
75
Hype
15
In three linesarXiv paper proposing formal framework for combining LLM and human evaluations. Uses doubly robust estimator (missing data literature) to determine optimal number of human reviews needed. Shifts LLM role from substitutive to auxiliary in two-stage sampling design.
Read source
Your take?
EvalsBenchmarksAI safetyAlignment

Summary generated by Claude — human-verified