Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs
Signal
78
Hype
25
In three linesCross-lingual red-teaming study of four MLLMs (Claude Sonnet 4.5, GPT-5, Pixtral Large, Qwen Omni) showing jailbreak vulnerability varies by language. Role-play attacks less effective in Mexican Spanish, visual attacks more effective. Safety rankings do not transfer across languages.Read source
Your take?
Summary generated by Claude — human-verified