arXiv cs.CL·19 May 2026

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

Signal

Hype

In three linesPARALLAX reveals that 4 of 6 major hallucination detection benchmarks embed the ground-truth answer in the prompt, allowing a naive baseline (TxTemb) to achieve near-perfect detection without access to model internals. Evaluation of 22 methods across 12 open-source models: most fail under controlled conditions, except SAPLMA and DRIFT (supervised probes on upper-layer hidden states).

Read source

Your take?

Benchmarks Evals AI safety Papers

Summary generated by Claude — human-verified

PARALLAX: Separating Genuine Hallucination Detection from Benchmark Construction Artifacts

Other angles on this story