PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

ArXi:2605.27545v1 Announce Type: new Jailbreak attacks on multimodal AI systems remain underexplored, even though unsafe image generation can have severe consequences than unsafe text and current defenses are relatively immature. We We characterize the attack along two dimensions. First, breadth: through temporal deepening, the framework incrementally strengthens historical anchoring and archival cues, eroding refusal boundaries across models with varying alignment strength.