Back to feed
arXiv cs.LG·

Spectral Unforgetting: Post-Hoc Recovery of Damaged Capabilities Without Retraining

Signal
78
Hype
18
In three linesDG-Hard, a spectral post-hoc method, recovers capabilities damaged by fine-tuning without retraining. It applies hard SVD thresholding (Donoho-Gavish) to weight matrices to isolate task-aligned signal from residual noise. Tested on 14 (model, task) settings and 9 benchmarks, it also restores safety alignment degraded by benign fine-tuning.
Read source
Your take?
Fine-tuningAI safetyAlignmentPapers

Summary generated by Claude — human-verified