MTP has no impact on my Qwen3.6 MoE performance

r/LocalLLaMA
Generative AI Open Source AI

Hello I have an rtx 5060Ti and I tried running unsloth's Qwen3.6-35B GGUF with MTP. However in both cases I have around 60 tok/s. Here are my flags: llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M --temp 0.6 --top-p 0.95 --top-k 20 --min-p 0.00 --alias unsloth/Qwen3.6 --port 8002 --k-unified --cache-type-k q8_0 --cache-type- q8_0 --flash-attn on --fit on --no-mmproj --ctx-size 64000 For the MTP variant of