Language-Switching Triggers Take a Latent Detour Through Language Models
Signal
78
Hype
15
In three linesCircuit analysis of a backdoor in an 8B model: a 3-word Latin trigger redirects English output to French. The circuit operates in 3 phases via attention heads, propagates through a subspace orthogonal to natural language-identity directions, then converts via MLP. A single serial bottleneck position controls the entire flow.Read source
Your take?
Summary generated by Claude — human-verified