Back to feed
arXiv cs.CL·

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

Signal
75
Hype
35
In three linesPAST2HARM is an adaptive jailbreak attack exploiting past tense reformulation to bypass safeguards in multimodal text-to-image models. Tested on Gemini Nano, GPT Image 2, and SD XL, it achieves 83%, 67%, and 100% success rates. The attack generates explicit sexual content, political disinformation, and hate speech.
Read source
Your take?
AI safetyAlignmentVisionEvalsBenchmarks

Summary generated by Claude — human-verified