arXiv cs.CL·19 May 2026

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Signal

Hype

In three linesSTING is an automated red-teaming framework measuring multi-turn illicit assistance in LLM agents. It constructs step-by-step illicit plans grounded in benign personas and uses judge agents to track completion. Multilingual evaluation across six non-English languages shows attack success does not consistently increase in lower-resource languages, diverging from chatbot findings.

Read source

Your take?

AI Agents AI safety Evals Benchmarks

Summary generated by Claude — human-verified

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Other angles on this story