[OSS] dlmserve - first serving engine for diffusion language models
r/LocalLLaMA
•
Machine Learning
Generative AI
Spent the last few months building this on a single RTX 5070. Quick context: diffusion language models (like LLaDA from gsai-ml) are a different beast from GPT-style autoregressive LLMs. Instead of generating one token at a time, they start with a fully masked sentence and iteratively denoise the whole thing in parallel. Cool tech, but mainstream serving engines are all built around the autoregressive contract, so none of them serve diffusion LLMs.