Jailbreaking Multimodal Large Language Models using Multi-Clip Video

ArXi:2606.02111v1 Announce Type: cross As multimodal large language models (MLLMs) have advanced to process video inputs, concerns have emerged about their potential for malicious misuse. Prior jailbreak studies have shown that safety alignment in MLLMs can be bypassed through visual inputs, yet it remains unclear which properties of video inputs induce this vulnerability. To address this gap, we