arXiv cs.CL·25 May 2026

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Signal

Hype

In three linesSparse autoencoders decompose GPT-2 XL and Llama-3.1-8B into 16K-32K interpretable features per layer. Semantic features alone recover 94% of peak encoding performance (r=0.285) and align with known cortical semantic organization (ρ=0.72, p<0.001). Results generalize across English, Chinese, and French.

Read source

Your take?

Papers GPT Llama Evals

Summary generated by Claude — human-verified

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Other angles on this story