AI RESEARCH
Evolving and Detecting Multi-Turn Deception using Geometric Signatures
arXiv CS.LG
•
ArXi:2605.27671v1 Announce Type: cross Safety defenses for large language models (LLMs) are typically trained and evaluated on single-turn prompts, yet real attacks often unfold as indirect, multi-turn probing. To defend against this nuanced form of deception, we present a unified pipeline that generates realistic multi-turn deceptive question sets via multi-objective genetic prompt optimization with co-evolving mutation operators.