Back to feed
arXiv cs.CL·

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Signal
75
Hype
35
In three linesEvoSynth, an autonomous multi-agent framework, optimizes jailbreak attacks in executable code space rather than prompt space. The system iteratively evolves and self-corrects code-based attack algorithms. Results: 85.5% Attack Success Rate against Claude-Sonnet-4.5, 95.9% average ASR across evaluated targets.
Read source
Your take?
AI AgentsMulti-agentClaudeAI safetyPapers

Summary generated by Claude — human-verified