Back to feed
arXiv cs.AI·

DACA-GRPO: Denoising-Aware Credit Assignment for Reinforcement Learning in Diffusion Language Models

Signal
78
Hype
15
In three linesDACA-GRPO improves reinforcement learning for diffusion language models by addressing temporal credit assignment and mean-field likelihood bias. It introduces Denoising Progress Scores and Stratified Masking Likelihood, achieving gains up to 7.4pp on code generation and 5.6pp on math reasoning across seven benchmarks.
Read source
Your take?
Reinforcement learningReasoningCode generationPapers

Summary generated by Claude — human-verified