AI RESEARCH

BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning

arXiv CS.CL

ArXi:2605.27110v1 Announce Type: cross In this work, we propose BAIT (Boundary-Aware Iterative Trap), a three-step jailbreak framework that approaches malicious goals through internal disclosure. BAIT first asks the model to identify the protection boundary, then requires it to refine that boundary, and finally requests a detailed example. By expanding each step upon the model's previous responses, BAIT turns the model's own reasoning and consistency tendency into a disclosure pathway.