Back to feed
arXiv cs.CL·

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

Signal
78
Hype
22
In three linesTRACES is a proactive safety auditor for multi-turn LLM agents that detects drift toward unsafe behavior from hidden representations of an observer LLM. Trained with weak trajectory-level supervision, it produces dense prefix-level risk estimates, improving full-trajectory safety prediction and proactive risk discrimination across multiple agent safety benchmarks.
Read source
Your take?
AI AgentsAI safetyReasoning

Summary generated by Claude — human-verified