I'm seeing low draft acceptance when using Qwen3.x MTP, what am I doing wrong?
r/LocalLLaMA
•
Generative AI
Open Source AI
I'm using llama.cpp, and I've tried Bartowski's and my own quants. When using Qwen3.5-122B or Qwen3.6-27B, I'm seeing really low draft acceptance in chats with interleaved code snippets (chatting with the LLM about programming / a code project). Acceptance is in the 40-60% bracket whereas I'm seeing people posting ~80% acceptance around here.