arXiv cs.AI·19 May 2026

SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors

Signal

Hype

In three linesSLEIGHT-Bench is a benchmark of 40 evasion attacks against LLM-based coding agent monitors. Claude Opus 4.6 with extended thinking catches only 23% of attacks (24/40 never detected). Evasion strategies exploit model priors, instruction ambiguity, and state manipulation.

Read source

Your take?

AI Agents AI safety Benchmarks Evals Code generation

Summary generated by Claude — human-verified

SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors

Other angles on this story