arXiv cs.CL·2 June 2026

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

Signal

Hype

In three linesDLLM-JEPA pairs JEPA with masked-diffusion language models for self-supervised representation learning. Eliminates need for explicit multi-view data and reduces training FLOPs by 33% vs LLM-JEPA. Achieves +18.7pp improvement on GSM8K (LLaDA-8B) and +11.4pp (Dream-7B) while preserving base model capabilities.

Read source

Your take?

Papers Fine-tuning Reasoning Evals

Summary generated by Claude — human-verified

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

Other angles on this story