[OSS] dlmserve - first serving engine for diffusion language models
Signal
75
Hype
25
In three linesdlmserve is the first serving engine for diffusion language models (LLaDA, Dream-7B). Unlike autoregressive LLMs, they denoise a fully masked sentence in parallel. OpenAI-compatible API, continuous batching, 2.5x throughput vs HuggingFace at batch=4, runs in 12 GB VRAM. MIT licensed, pip install dlmserve.Read source
Your take?
Summary generated by Claude — human-verified