Back to feed
arXiv cs.LG·

Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction

Signal
82
Hype
15
In three linesAI text detectors amplify a pretrained typicality axis rather than construct an AI-vs-human boundary. On RoBERTa-base, raw projection onto centroid(AI)-centroid(HC3) achieves AUROC 0.806-0.944, matching or exceeding fine-tuning. A closed-form Jacobian predictor transfers to 16/16 third-party detectors with oracle-equivalence, reducing FPR by 57% on the OpenAI detector.
Read source
Your take?
EvalsBenchmarksAI safetyAlignment

Summary generated by Claude — human-verified