Back to feed
Reddit r/MachineLearning·

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL [R]

Signal
78
Hype
25
In three linesMasked diffusion language models (MDLMs) outperform autoregressive LLMs as world models for agentic RL. Fine-tuned SDAR-8B and WeDLM-8B achieve 4x gains on BLEU-1/ROUGE-L/MAUVE. GRPO training yields +15% absolute task-success on ScienceWorld, ALFWorld, AppWorld with Qwen3, Mistral, LFM2.5 in zero-shot transfer.
Read source
Your take?
AI AgentsReinforcement learningReasoningBenchmarksPapers

Summary generated by Claude — human-verified