dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats

ArXi:2606.04115v1 Announce Type: cross Quantizing large language models (LLMs) to low-precision floating-point representations is central to efficient deployment, yet applying a single bit-width uniformly across all layers is sub-optimal in terms of both performance and accuracy. This work