WINDQuant: Weight-Informed Neural Decision-Making for Global Mixed-Precision LLM Quantization

ArXi:2605.26660v1 Announce Type: new Quantization is an effective approach to reduce the memory footprint and inference cost of large language models (LLMs), yet maintaining performance in the ultra-low-bit regime remains challenging. Existing post-