Single-Sample Black-Box Membership Inference Attack against Vision-Language Models via Cross-modal Semantic Alignment
Signal
72
Hype
25
In three linesNovel black-box, single-sample membership inference attack against Vision-Language Models. Exploits cross-modal semantic alignment: training data exhibits stronger image-caption alignment than non-members. Achieves AUC 0.821 against LLaVA-1.5 on VL-MIA/Flickr dataset.Read source
Your take?
Summary generated by Claude — human-verified