arXiv cs.CL·29 May 2026

Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation

Signal

Hype

In three linesComprehensive evaluation of 14 open-source safety guard models on 79,331 samples across 8 NIST AI Risk Framework categories. Qwen Guard (4B) achieves highest recall (83.97%), outperforming Llama Guard (12B) and GPT-OSS Safeguard (20B). Model size does not correlate with safety detection performance.

Read source

Your take?

Benchmarks AI safety Open source Qwen Llama

Summary generated by Claude — human-verified

Benchmarking Open-Source Safety Guard Models: A Comprehensive Evaluation

Other angles on this story