VLLM gives 5x speed of llama but quants not available (unsloth/gguf). What to do?

r/LocalLLaMA
Generative AI Open Source AI AI Tools

Hi - I want to run unsloth dynamic quant on vllm. Why? vllm is giving faster prefill speed - Llama - i get 800-1000 tokens/sec - Vllm - i get 5k-10K tokens/sec Tried using Qwen3.6-35B-A3B FP8 official. Machine is RTX A6000 - ampere 48gb Unsloth q8 quant (on llama testing) gives correct pandas code, even official FP8 sucks Why unsloth quant? For some reason - with my task - writing pandas - unsloth quant at 8bit gives much better results than the official fp8 quant. I dont know why.