Reddit r/LocalLLaMA·19 May 2026

Nemotron-Labs-Diffusion from NVIDIA

Signal

Hype

In three linesNVIDIA releases Nemotron-Labs-Diffusion, tri-mode model (AR, diffusion, self-speculation) in 3B/8B/14B sizes. Self-speculation combines diffusion drafting and AR verification with shared KV cache: 3× higher acceptance length vs Qwen3-8B-Eagle3, 2.2× speedup, 4× speedup on GB200 (1015 tok/sec with custom CUDA kernels).

Read source

Your take?

Code generation Benchmarks

Summary generated by Claude — human-verified

Nemotron-Labs-Diffusion from NVIDIA

Other angles on this story