minor speed bump for MTP with Qwen3.6-27B-MTP Q6_K_XL

r/LocalLLaMA
Generative AI Open Source AI

I'm on Macbook M5 Max with 128GB RAM Running a test in openwebui using llama-server (llama.cpp): unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (non MTP): 19tps unsloth/Qwen3.6-27B-UD-Q6_K_XL.gguf (MTP): 22.3tps So nothing like the massive improvements I hear about. Possibly my own settings though. both use: --temp 0.6 --top-p 0.8 --top-k 20 --min-p 0.00 --cache-ram 24576 --batch-size 4096 --ubatch-size 2048 submitted by /u/chimph [link] [comments]