arXiv cs.AI·19 May 2026

Are Sparse Autoencoder Benchmarks Reliable?

Signal

Hype

In three linesCritical audit of SAEBench, the de-facto standard evaluation suite for sparse autoencoders (SAEs). TPP and SCR metrics fail multiple reliability tests and should not be used. Other metrics show higher reseed noise and lower discriminability than assumed. Only sae-probes demonstrates acceptable reliability, but struggles to distinguish architecture variants.

Read source

Your take?

Evals Benchmarks Papers AI safety Alignment

Summary generated by Claude — human-verified

Are Sparse Autoencoder Benchmarks Reliable?

Other angles on this story