Back to feed
arXiv cs.AI·

Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models

Signal
72
Hype
25
In three linesHT-GRPO, a hierarchical reinforcement learning method for diffusion multi-modal models, organizes optimization into three stages (global, structure, refinement). It solves multiple unmasking sequences and assigns differentiated rewards based on token importance. Tests on MMaDA and Lumina-DiMOO show gains on GenEval and DPG benchmarks.
Read source
Your take?
Reinforcement learningImage generationBenchmarks

Summary generated by Claude — human-verified