Explicit Critic Guidance for Aligning Diffusion Models
Signal
78
Hype
15
In three linesNew online reinforcement learning method for aligning diffusion models with non-differentiable objectives. State-aligned latent actor-critic framework where the diffusion model predicts values directly on noisy latent states, enabling trajectory-level PPO training and multi-reward optimization. Outperforms prior baselines on UNet and DiT benchmarks.Read source
Your take?
Summary generated by Claude — human-verified