DACA-GRPO: Denoising-Aware Credit Assignment for Reinforcement Learning in Diffusion Language Models
Signal
78
Hype
15
In three linesDACA-GRPO improves reinforcement learning for diffusion language models by addressing temporal credit assignment and mean-field likelihood bias. It introduces Denoising Progress Scores and Stratified Masking Likelihood, achieving gains up to 7.4pp on code generation and 36.3pp on constraint satisfaction across seven benchmarks.Read source
Your take?
Summary generated by Claude — human-verified