Qwen3.6 27B Pure Quant: 40 tok/s on 16 GB VRAM
r/LocalLLaMA
•
Generative AI
Hello everyone! I want to share the result of my experiment to make Qwen3.6 27B Q4_K_M fits in to my RTX 5060 Ti 16 GB. Inspired by u/Due-Project-7507 's work on Ununnilium/Qwen3.6-27B-IQ4_XS-pure-GGUF. Using the same pure quantization method, I was able to create a Q4_K_M ggufs that fit completely in 16 GB VRAM. Model URL: There are two versions Q4_K_M MTP (15.4 GB) and Q4_K_M non-MTP (15.1