Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms

ArXi:2606.00289v1 Announce Type: new Quantization is a fundamental tool used to compress datasets, neural network weights, and memory usage in a range of computational tasks. Many downstream applications of vector quantization perform inner products with arbitrary inputs. This motivates the study of inner product aware quantization schemes that approximately preserve inner products with unseen vectors -- in contrast to simply minimizing the mean-squared error.