How Quantization Changes Interpretable Features: A Sparse Autoencoder Analysis of Language Models

ArXi:2606.03002v1 Announce Type: new Quantization is a standard path to deploying large language models, and a quantized model is typically judged acceptable when its perplexity or downstream accuracy stays close to the full-precision original. Whether the model still computes in the same way, or whether the interpretable features identified in the full-precision model survive weight rounding, is rarely tested, even as safety audits and steering interventions increasingly rely on those features.