AI RESEARCH
Jailbreak susceptibility prediction and mitigation via the behavioral geometry of models
arXiv CS.AI
•
ArXi:2605.26409v1 Announce Type: cross Evaluating and mitigating a generative system's susceptibility to jailbreak attacks is critical to its safe deployment. Given the number of deployable systems, full per-configuration evaluation and optimization is impractical. In this paper, we formalize the behavioral geometry of a population of models that, by leveraging previously evaluated and defended models, s both efficient susceptibility prediction and effective defense transfer across a population.