Back to feed
arXiv cs.AI·

SLEIGHT-Bench: A Benchmark of Evasion Attacks Against Agent Monitors

Signal
78
Hype
25
In three linesSLEIGHT-Bench is a benchmark of 40 evasion attacks against LLM-based coding agent monitors. Claude Opus 4.6 with extended thinking catches only 23% of attacks (24/40 never detected). Evasion strategies exploit model priors, instruction ambiguity, and state manipulation.
Read source
Your take?
AI AgentsAI safetyBenchmarksEvalsCode generation

Summary generated by Claude — human-verified