arXiv cs.CL·19 May 2026

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

Signal

Hype

In three linesStudy showing that unlearning in LLMs merely suppresses information at surface level—models recover original behavior through minimal fine-tuning. Authors introduce representation-level analysis framework (PCA, CKA, Fisher information) to assess genuine data erasure and identify four forgetting regimes based on reversibility and catastrophicity.

Read source

Your take?

Papers AI safety Alignment Evals

Summary generated by Claude — human-verified

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

Other angles on this story