LLM Guard scored 0/8 on a USENIX 2025 multi-turn jailbreak. Here’s what caught it instead.

r/artificial
Generative AI AI Safety

Crescendo (Russinovich, USENIX Security 2025) is a multi-turn jailbreak designed specifically to evade output-based monitors. Each individual turn looks completely innocent. The attack only exists across turns. LLM Guard result: 0/8 turns detected. It scores each prompt independently. It has no memory. It never sees the attack. Arc Sentry result: flagged at Turn 3. Arc Sentry doesn’t read the text. It reads what the model’s internal state does with the text.