Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?
Signal
78
Hype
18
In three linesComparative study of RL vs SFT on Qwen2.5-3B-Instruct: reinforcement learning better preserves internal circuits of the base model than supervised fine-tuning (SFT), which adapts faster but destroys more prior capabilities. Proposed metric: differential circuit vulnerability at attention head level.Read source
Your take?
Summary generated by Claude — human-verified