FishBack: Pullback Fisher Geometry for Optimal Activation Steering in Transformers
FishBack proposes activation steering using pullback Fisher geometry for transformers. Authors show activation space is non-Euclidean (>97% deviation on GPT-2) and derive closed-form optimal steering equation. Method outperforms CAA, ActAdd, ITI by 1.3×–2.5× on off-target KL reduction.