InfoMerge: Information-aware Token Compression for Efficient Video Large Language Models

ArXi:2606.02161v1 Announce Type: cross Video Large Language Models (Video-LLMs) achieve strong performance in video understanding, but their excessive visual tokens bring substantial computational overhead. Existing