Massive Spikes in LLMs are Bias Vectors: Mechanistic Uncovering and Spike-Free Quantization

ArXi:2606.02288v1 Announce Type: new Massive activation spikes in Large Language Models (LLMs) severely degrade quantization by stretching dynamic ranges. While prior hypotheses characterize these as high-level scalar biases, we argue that they are merely the scalar intermediates of rigid, structural vector biases in the spike-carrying tokens. We show that these tokens converge to constant vectors after normalization that drive the attention sink and value-state drain mechanisms.