arXiv cs.CL·28 May 2026

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

Signal

Hype

In three linesTRACES is a proactive safety auditor for multi-turn LLM agents that detects drift toward unsafe behavior from hidden representations of an observer LLM. Trained with weak trajectory-level supervision, it produces dense prefix-level risk estimates, improving full-trajectory safety prediction and proactive risk discrimination across multiple agent safety benchmarks.

Read source

Your take?

AI Agents AI safety Reasoning

Summary generated by Claude — human-verified

TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

Other angles on this story