PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI
Signal
75
Hype
35
In three linesPAST2HARM is an adaptive jailbreak attack exploiting past tense reformulation to bypass safeguards in multimodal text-to-image models. Tested on Gemini Nano, GPT Image 2, and SD XL, it achieves 83%, 67%, and 100% success rates. The attack generates explicit sexual content, political disinformation, and hate speech.Read source
Your take?
Summary generated by Claude — human-verified