Back to feed
arXiv cs.CL·

DLLM-JEPA: Joint Embedding Predictive Architectures for Masked Diffusion Language Models

Signal
78
Hype
25
In three linesDLLM-JEPA pairs JEPA with masked-diffusion language models for self-supervised representation learning. Eliminates need for explicit multi-view data and reduces training FLOPs by 33% vs LLM-JEPA. Achieves +18.7pp improvement on GSM8K (LLaDA-8B) and +11.4pp (Dream-7B) while preserving base model capabilities.
Read source
Your take?
PapersFine-tuningReasoningEvals

Summary generated by Claude — human-verified