I added native MTP to exo for Qwen3.6 MLX models; here are the exactness and speed results
r/LocalLLaMA
•
Generative AI
I opened my first contribution to exo: native multi-token prediction for Qwen3.6-style MLX checkpoints. I hope it is useful. The personal motivation is simple: I am waiting for Mac Studios to arrive and I want to use exo as a local distributed inference cluster across them. Native MTP looked like one of the pieces worth getting right before that setup lands. For ed model cards it should work out of the box. The macOS setting is on by default, and the CLI path enables native MTP unless EXO_NATIVE_MTP_ENABLED=0 is set.