Back to feed
arXiv cs.LG·

Not All Transitions Matter: Evidence from PPO

Signal
72
Hype
15
In three linesarXiv paper showing consecutive transitions in on-policy RL are redundant and cause hidden instability. Randomly dropping 25% of transitions in PPO stabilizes training (KL divergence, entropy, value estimates) without degrading rewards, across CartPole, Acrobot, LunarLander, HalfCheetah, Hopper.
Read source
Your take?
Reinforcement learningPapersBenchmarks

Summary generated by Claude — human-verified