Back to feed
arXiv cs.AI·

Reverse-Engineering Model Editing on Language Models

Signal
82
Hype
15
In three linesResearchers reveal a critical vulnerability in locate-then-edit model editing methods: parameter updates enable attackers to recover edited data via KSTER attack exploiting low-rank structure. A defense using subspace camouflage is proposed to obfuscate fingerprints without compromising editing utility.
Read source
Your take?
AI safetyAlignmentPapers

Summary generated by Claude — human-verified