idSCD: Identifying Training Datasets through Semantic Correlation Descriptors
Signal
72
Hype
15
In three linesNew method to identify whether a dataset was used in model training by analyzing semantic correlation descriptors (SCDs) learned internally. White-box approach outperforms black-box baselines (RMIA, LiRA) with gains up to 60% ROC-AUC on NLI, emotion, and medical text classification tasks.Read source
Your take?
Summary generated by Claude — human-verified