Distinguishable Deletion: Unifying Knowledge Erasure and Refusal for Large Language Model Unlearning
Signal
72
Hype
25
In three linesDistinguishable Deletion (D²) unifies knowledge deletion and refusal for LLM unlearning. The method uses an energy index to erase undesirable knowledge in latent representations rather than specific tokens, avoiding biased deletion and re-emergence of harmful content. Energy-based Unlearning Alignment (EUA) applies this mechanism at training and inference.Read source
Your take?
Summary generated by Claude — human-verified