DACA-GRPO: Denoising-Aware Credit Assignment for Reinforcement Learning in Diffusion Language Models
DACA-GRPO improves reinforcement learning for diffusion language models by addressing temporal credit assignment and mean-field likelihood bias. It introduces Denoising Progress Scores and Stratified Masking Likelihood, achieving gains up to 7.4pp on code generation and 5.6pp on math reasoning across seven benchmarks.