Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications
Signal
75
Hype
15
In three linesUnified survey on Pretraining Data Exposure (PDE) in LLMs, covering membership inference and data contamination. Formalizes PDE across exposure levels, reviews attack and defense methods, and identifies open challenges for evaluation integrity and privacy protection.Read source
Your take?
Summary generated by Claude — human-verified