Trust-Region Diffusion Policies for Massively Parallel On-Policy RL
Signal
78
Hype
25
In three linesTruDi enables diffusion policies for massively parallel on-policy RL by integrating trust-region optimization with KL-divergence constraints over entire diffusion trajectories. Evaluated on 73 tasks across 4 benchmarks: outperforms baselines on standard tasks, achieves clear gains on challenging humanoid control.Read source
Your take?
Summary generated by Claude — human-verified