Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents
Signal
78
Hype
25
In three linesSTING is an automated red-teaming framework measuring multi-turn illicit assistance in LLM agents. It constructs step-by-step illicit plans grounded in benign personas and uses judge agents to track completion. Multilingual evaluation across six non-English languages shows attack success does not consistently increase in lower-resource languages, diverging from chatbot findings.Read source
Your take?
Summary generated by Claude — human-verified