Variance reduction for policy gradient with action-dependent factorized baselines
OpenAI publishes a variance reduction method for policy gradient algorithms using action-dependent factorized baselines. The technique improves training efficiency by reducing gradient estimator variance, applicable to reinforcement learning models.