Back to feed
arXiv cs.CL·

FishBack: Pullback Fisher Geometry for Optimal Activation Steering in Transformers

Signal
78
Hype
15
In three linesFishBack proposes activation steering using pullback Fisher geometry for transformers. Authors show activation space is non-Euclidean (>97% deviation on GPT-2) and derive closed-form optimal steering equation. Method outperforms CAA, ActAdd, ITI by 1.3×–2.5× on off-target KL reduction.
Read source
Your take?
ReasoningPapersBenchmarks

Summary generated by Claude — human-verified