Looking for a working Deepseek-v4-Flash quant

r/LocalLLaMA
Generative AI Open Source AI AI Tools

Best I tried so far is with the custom llama.cpp fork, but it suffers from low quality and random incoherent output. VLLM wouldn't anything other than H100s for DS4. Any quantization out there that works on llama.cpp/vllm? submitted by /u/ortegaalfredo [link] [comments]