arXiv cs.CL·10 June 2026

Prefilling-dLLM: Predictive Prefilling for Long-Context Inference in Diffusion Language Models

Signal

Hype

In three linesPrefilling-dLLM optimizes diffusion language model inference by partitioning context into chunks, caching their KV representations, and selecting relevant chunks with intra-chunk token sparsity. Achieves 9.1–28.0x speedup on 8K–32K contexts without full prefix re-encoding.

Read source

Your take?

Reasoning Benchmarks Infrastructure

Summary generated by Claude — human-verified

Prefilling-dLLM: Predictive Prefilling for Long-Context Inference in Diffusion Language Models

Other angles on this story