Back to feed
arXiv cs.AI·

SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models

Signal
72
Hype
18
In three linesSIPO stabilizes diffusion model alignment to human preferences by addressing training instability and off-policy bias. The method introduces DPO-C&M to clip uninformative timesteps and applies timestep-aware importance reweighting. Experiments on SD1.5, SDXL, CogVideoX-2B/5B, and Wan2.1-1.3B demonstrate improvements over Diffusion-DPO.
Read source
Your take?
Image generationVideo generationReinforcement learningAlignmentPapers

Summary generated by Claude — human-verified