FP16 on Qwen 3.6 27B

r/LocalLLaMA
Open Source AI

Have there been any notable difference between Q8 and FP16 on both the weights and the cache? I know the jump to Q8 is significant. I would test myself, but FP16 on my setup is painfully slow. Also side question, is ~14TPS around the number I should be expecting on a Strix Halo running 3.6 27B at Q8 during coding tasks? I have my MTP max draft set to 3 and it seems to be slightly better than 2 which runs around ~11. Another side note in case if you haven't ran into it, 27B is way better when context is below 100k.