AI RESEARCH

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

arXiv CS.CL

ArXi:2605.27545v1 Announce Type: new Jailbreak attacks on multimodal AI systems remain underexplored, even though unsafe image generation can have severe consequences than unsafe text and current defenses are relatively immature. We We characterize the attack along two dimensions. First, breadth: through temporal deepening, the framework incrementally strengthens historical anchoring and archival cues, eroding refusal boundaries across models with varying alignment strength.