arXiv cs.CL·19 May 2026

Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts

Signal

Hype

In three linesRed-Bandit is a red-teaming framework that adapts real-time specialized LoRA experts for different attack styles (manipulation, slang) via reinforcement learning. A multi-armed bandit algorithm dynamically selects the optimal expert based on target model response safety. State-of-the-art results on AdvBench with more readable prompts.

Read source

Your take?

AI safety Fine-tuning Reinforcement learning Evals Papers

Summary generated by Claude — human-verified

Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts

Other angles on this story