Back to feed
arXiv cs.CL·

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

Signal
78
Hype
15
In three linesAERIC is a lightweight safety monitor (387 parameters) detecting implicit harmful dialogue by analyzing hidden states during decoding without additional forward passes. On DiaSafety and Harmful Advice, it improves AUROC from 0.683→0.714 and 0.822→0.858. Deployment adds only 2.34% latency versus 79.40% for Qwen3Guard-Stream-4B.
Read source
Your take?
AI safetyAlignmentReasoningBenchmarks

Summary generated by Claude — human-verified