Back to feed
arXiv cs.LG·

Hierarchical Variational Policies for Reward-Guided Diffusion

Signal
72
Hype
18
In three linesHierarchical variational framework for adapting pretrained diffusion models to reward-aligned objectives. Formulates test-time adaptation as a lightweight stochastic policy that amortizes per-step control. On 4x super-resolution: better perceptual quality with 5x faster inference than best baseline.
Read source
Your take?
Reinforcement learningImage generation

Summary generated by Claude — human-verified