I ran a quantization shootout on Qwen3-Coder and the results are... interesting

r/LocalLLaMA
Generative AI Open Source AI

Out of random curiousity I ran a shootout on Qwen3-Coder-Next. I've been using the MXFP4_MOE from unsloth for awhile as it's just really fast on my system. But was curious about perceision. I know quantization hurts the model, but I don't think I had really understoof that till I tested it myself. Hardware: 3× R9700 PRO (96 GB VRAM) Backend: llama.cpp Vulkan Eval: wikitext-2 (583 chunks, ctx 512) Formats tested: MXFP4_MOE Q4_K_M Q5_K_M UD-Q5_K_M TLDR: UD-Q5_K_M is cooking! Better quality than formats half its size, barely any speed penalty.