Back to feed
arXiv cs.CL·

CHASE: Adversarial Red-Blue Teaming for Improving LLM Safety using Reinforcement Learning

Signal
82
Hype
18
In three linesCHASE is a co-evolutionary red-blue teaming framework training an attacker and defender via GRPO to improve LLM robustness against prompt-rewriting attacks (persona modulation, fictional framing). Evaluated on BeaverTails and JailbreakBench, it reduces StrongREJECT score by 43.2% with 0% false refusals on benign prompts.
Read source
Your take?
AI safetyAlignmentReinforcement learningEvals

Summary generated by Claude — human-verified