Beyond Surrogate Gradients: Fully Differentiable Token Pruning for Vision-Language Models

ArXi:2605.28051v1 Announce Type: new Visual token pruning reduces the computational cost of Vision-Language Models (VLMs) by removing redundant visual tokens. Existing methods typically rely on Gumbel-Softmax to approximate discrete selection during