SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models
SIPO stabilizes diffusion model alignment to human preferences by addressing training instability and off-policy bias. The method introduces DPO-C&M to clip uninformative timesteps and applies timestep-aware importance reweighting. Experiments on SD1.5, SDXL, CogVideoX-2B/5B, and Wan2.1-1.3B demonstrate improvements over Diffusion-DPO.