Back to feed
arXiv cs.LG·

idSCD: Identifying Training Datasets through Semantic Correlation Descriptors

Signal
72
Hype
15
In three linesNew method to identify whether a dataset was used in model training by analyzing semantic correlation descriptors (SCDs) learned internally. White-box approach outperforms black-box baselines (RMIA, LiRA) with gains up to 60% ROC-AUC on NLI, emotion, and medical text classification tasks.
Read source
Your take?
PapersAI safetyEvals

Summary generated by Claude — human-verified