Enhancing Multimodal Large Language Models for Safety-Critical Driving Video Analysis

ArXi:2605.22185v1 Announce Type: new Recent advancements in Multimodal Large Language Models (MLLMs) have nstrated impressive capabilities in general visual understanding. However, their application to safety-critical driving scenarios remains limited by an inability to accurately perceive and reason about rare high-stakes dynamic events, such as collisions or near-collisions. To address this, we