TRACES: Proactive Safety Auditing for Multi-Turn LLM Agents via Trajectory-State Modeling

ArXi:2605.27690v1 Announce Type: cross LLM agents increasingly operate through multi-turn tool use and environment interaction, where safety risks often emerge from intermediate steps long before they surface in the final outcome. Reactive auditing is therefore insufficient: post-hoc diagnosis frequently misses the chance to flag risks while they are unfolding. We propose TRACES, a representation-based proactive auditor that learns prefix-level trajectory risk states from the hidden representations of an observer.