arXiv cs.CL·27 May 2026

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

Signal

Hype

In three linesUnified survey on Pretraining Data Exposure (PDE) in LLMs, covering membership inference and data contamination. Formalizes PDE across exposure levels, reviews attack and defense methods, and identifies open challenges for evaluation integrity and privacy protection.

Read source

Your take?

AI safety Alignment Evals Papers

Summary generated by Claude — human-verified

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

Other angles on this story