arXiv cs.LG·29 May 2026

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Signal

Hype

In three linesComparative study of RL vs SFT on Qwen2.5-3B-Instruct: reinforcement learning better preserves internal circuits of the base model than supervised fine-tuning (SFT), which adapts faster but destroys more prior capabilities. Proposed metric: differential circuit vulnerability at attention head level.

Read source

Your take?

Reinforcement learning Fine-tuning Papers

Summary generated by Claude — human-verified

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Other angles on this story