arXiv cs.CL·25 May 2026

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Signal

Hype

In three linesRed-teaming study of 30+ open-source LLMs (10 families, 5 countries) measuring capacity to generate biased political content via jailbreaks. Findings: systematic asymmetries (left-leaning bias), Overton Window contraction with model size, substantial regional differences, variable jailbreak potency across model families.

Read source

Your take?

AI safety Alignment Open source Evals Regulation

Summary generated by Claude — human-verified

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Other angles on this story