WildCat: Near-Linear Attention in Theory and Practice

We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also notoriously expensive to deploy due to resource requirements that scale quadratically with the input sequence length $n$. Crucially, we select the coreset using a fast but spectrally-accurate subsampling algorithm -- randomly