Back to feed
arXiv cs.CL·

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Signal
78
Hype
25
In three linesRed-teaming study of 30+ open-source LLMs (10 families, 5 countries) measuring capacity to generate biased political content via jailbreaks. Findings: systematic asymmetries (left-leaning bias), Overton Window contraction with model size, substantial regional differences, variable jailbreak potency across model families.
Read source
Your take?
AI safetyAlignmentOpen sourceEvalsRegulation

Summary generated by Claude — human-verified