arXiv cs.CL·3 juin 2026

The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction

Signal

Hype

En 3 lignesFramework combinant prédiction conforme et représentation collaborative pour analyser le comportement des LLM face aux annotateurs humains en modération de contenu. Introduit la métrique Ghost Prediction pour quantifier les divergences modèle-humains. Évaluation sur 4 LLM et 4 datasets révèle que les grands modèles sont plus confiants sur textes sans alignement humain, avec biais démographique structurel.

Lire la source

Ton avis ?

Évaluations Sécurité IA Alignement Benchmarks

Résumé généré par Claude — vérifié par l'humain

The Ghost Annotator: a Framework to Explore Human Label Variation in Content Moderation through Conformal Prediction

Autres angles sur ce sujet