Back to feed
arXiv cs.CL·

Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts

Signal
78
Hype
25
In three linesRed-Bandit is a red-teaming framework that adapts real-time specialized LoRA experts for different attack styles (manipulation, slang) via reinforcement learning. A multi-armed bandit algorithm dynamically selects the optimal expert based on target model response safety. State-of-the-art results on AdvBench with more readable prompts.
Read source
Your take?
AI safetyFine-tuningReinforcement learningEvalsPapers

Summary generated by Claude — human-verified