Pop-Up Distractions Reveal Bag-of-Events Behavior in Video Large Language Models

ArXi:2605.27101v1 Announce Type: cross A key capability for video understanding is reliably linking subjects to events across time, yet whether Video Large Language Models (VideoLLMs) actually achieve this remains unclear. In this work, we