Equivalence between policy gradients and soft Q-learning
OpenAI demonstrates mathematical equivalence between policy gradient methods and soft Q-learning in reinforcement learning. This theoretical finding unifies two major RL approaches and enables combining their respective advantages.