Back to feed
arXiv cs.CL·

Helpful to a Fault: Measuring Illicit Assistance in Multi-Turn, Multilingual LLM Agents

Signal
78
Hype
25
In three linesSTING is an automated red-teaming framework measuring multi-turn illicit assistance in LLM agents. It constructs step-by-step illicit plans grounded in benign personas and uses judge agents to track completion. Multilingual evaluation across six non-English languages shows attack success does not consistently increase in lower-resource languages, diverging from chatbot findings.
Read source
Your take?
AI AgentsAI safetyEvalsBenchmarks

Summary generated by Claude — human-verified