EarlyTom: Early Token Compression Completes Fast Video Understanding

ArXi:2605.30010v1 Announce Type: new Video large language models (Video-LLMs) have nstrated strong capabilities in video understanding tasks. However, their practical deployment is still hindered by the inefficiency