TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling
Signal
78
Hype
22
In three linesTRACES is a proactive safety auditor for multi-turn LLM agents that detects drift toward unsafe behavior from hidden representations of an observer LLM. Trained with weak trajectory-level supervision, it produces dense prefix-level risk estimates, improving full-trajectory safety prediction and proactive risk discrimination across multiple agent safety benchmarks.Read source
Your take?
Summary generated by Claude — human-verified