Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models
Signal
72
Hype
25
In three linesHT-GRPO, a hierarchical reinforcement learning method for diffusion multi-modal models, organizes optimization into three stages (global, structure, refinement). It solves multiple unmasking sequences and assigns differentiated rewards based on token importance. Tests on MMaDA and Lumina-DiMOO show gains on GenEval and DPG benchmarks.Read source
Your take?
Summary generated by Claude — human-verified