unsloth vs bartowski MTP ggufs
r/LocalLLaMA
•
Generative AI
Open Source AI
I noticed that bartowski's MTP ggufs are bigger than unsloth. I asked bartowski and he said he used Q8_0 quant for the MTP head. So I compare the decoding performance of the two. /build/bin/llama-server -m ~/gguf/Qwen3.5-4B-Q4_0.gguf --host 0.0.0.0 --port 8080 -c 4096 -fa on --no-mmap -np 1 -ngl 99 --spec-type draft-mtp Since I am interested in running them on snapdragon smartphones, so I only tested Q4_0, IQ4_NL, Q4_1, MXFP4_MOE, Q8_0. I am limited by my 24GB VRAM 3090, so I can't test Q8_0 for the big models.