arXiv cs.LG·22 May 2026

Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction

Signal

Hype

In three linesAI text detectors amplify a pretrained typicality axis rather than construct an AI-vs-human boundary. On RoBERTa-base, raw projection onto centroid(AI)-centroid(HC3) achieves AUROC 0.806-0.944, matching or exceeding fine-tuning. A closed-form Jacobian predictor transfers to 16/16 third-party detectors with oracle-equivalence, reducing FPR by 57% on the OpenAI detector.

Read source

Your take?

Evals Benchmarks AI safety Alignment

Summary generated by Claude — human-verified

Amplifying, Not Learning: Fine-Tuned AI Text Detectors Amplify a Pretrained Direction

Other angles on this story