CIVIC: End-to-End Sequence Compactness for Efficient Vision-Language Models

ArXi:2605.28115v1 Announce Type: new Vision-Language Models (VLMs) face severe memory and latency bottlenecks due to high-resolution visual tokens. While current token reduction methods theoretically save FLOPs, post-hoc pruning