Back to feed
arXiv cs.AI·

Are Sparse Autoencoder Benchmarks Reliable?

Signal
75
Hype
15
In three linesCritical audit of SAEBench, the de-facto standard evaluation suite for sparse autoencoders (SAEs). TPP and SCR metrics fail multiple reliability tests and should not be used. Other metrics show higher reseed noise and lower discriminability than assumed. Only sae-probes demonstrates acceptable reliability, but struggles to distinguish architecture variants.
Read source
Your take?
EvalsBenchmarksPapersAI safetyAlignment

Summary generated by Claude — human-verified