Back to feed
arXiv cs.CL·

AI Rater Discrimination Depends on Scoring Protocol in Complex Clinical Decision-Making

Signal
78
Hype
15
In three linesFactorial study of 4 open-source LLMs rating clinical decisions in type 2 diabetes pharmacotherapy. LLMs as AI raters score 74–78 points under rubric-free protocol vs 7.69–49.64 points under anchored Gold Rubric. Rubric amplifies discrimination between CDSS models (1.76–5.10×) and reveals behavioral variation suppressed without rubric.
Read source
Your take?
EvalsBenchmarksAI safetyAlignment

Summary generated by Claude — human-verified