Deepseek V4 flash performance on DGX Spark
r/LocalLLaMA
•
Open Source AI
Hello Reddit I have been trying to get Deepseek V4 on the DGX Spark for the past week. Yesterday I was finally able to get it to work thanks to the hard work from the folks at local-inference-lab. The variants I have are the ASUS GX10. Two GX10s are Hooked up to their connect X-7 port running in docker with a very janky setup. The max context I can safely fit is around 1M tokens in the KV cache. I typically run it at 256k max for concurrency. It's running the original MXFP8 x MXFP4 model for Deepseek v4 flash. There's some NVFP4 variants out there but I haven't tested them.