Back to feed
arXiv cs.CL·

The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction

Signal
72
Hype
18
In three linesFramework combining conformal prediction and collaborative filtering-style annotator representation to analyze LLM behavior against human annotators in content moderation. Introduces Ghost Prediction metric to quantify model-human divergences. Evaluation across 4 LLMs and 4 datasets shows larger models more confident on texts with no human alignment, revealing structural demographic bias.
Read source
Your take?
EvalsAI safetyAlignmentBenchmarks

Summary generated by Claude — human-verified