arXiv cs.CL·25 May 2026

Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

Signal

Hype

In three linesCross-lingual red-teaming study of four MLLMs (Claude Sonnet 4.5, GPT-5, Pixtral Large, Qwen Omni) showing jailbreak vulnerability varies by language. Role-play attacks less effective in Mexican Spanish, visual attacks more effective. Safety rankings do not transfer across languages.

Read source

Your take?

AI safety Alignment Evals Vision

Summary generated by Claude — human-verified

Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

Other angles on this story