Looking for a working Deepseek-v4-Flash quant
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Tools
Best I tried so far is with the custom llama.cpp fork, but it suffers from low quality and random incoherent output. VLLM wouldn't anything other than H100s for DS4. Any quantization out there that works on llama.cpp/vllm? submitted by /u/ortegaalfredo [link] [comments]