arXiv cs.AI·19 May 2026

SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models

Signal

Hype

In three linesSIPO stabilizes diffusion model alignment to human preferences by addressing training instability and off-policy bias. The method introduces DPO-C&M to clip uninformative timesteps and applies timestep-aware importance reweighting. Experiments on SD1.5, SDXL, CogVideoX-2B/5B, and Wan2.1-1.3B demonstrate improvements over Diffusion-DPO.

Read source

Your take?

Image generation Video generation Reinforcement learning Alignment Papers

Summary generated by Claude — human-verified

SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models

Other angles on this story