Better exploration with parameter noise
OpenAI shows that adding adaptive noise to reinforcement learning algorithm parameters frequently boosts performance. This exploration method is simple to implement and rarely decreases performance.
5 articles
OpenAI shows that adding adaptive noise to reinforcement learning algorithm parameters frequently boosts performance. This exploration method is simple to implement and rarely decreases performance.
OpenAI releases PPO (Proximal Policy Optimization), a class of reinforcement learning algorithms simpler to implement and tune than existing approaches, with comparable or superior performance. PPO has become OpenAI's default RL algorithm.
OpenAI created adversarial images that reliably fool neural network classifiers across varied scales and perspectives. This challenges a recent claim that self-driving cars would be hard to trick maliciously due to their multi-angle image capture.
OpenAI publishes Hindsight Experience Replay (HER), a reinforcement learning method enabling agents to learn from failed experiences by retroactively reframing goals. This technique substantially improves training efficiency on complex tasks.
OpenAI introduces a teacher-student curriculum learning approach where a teacher model generates progressively harder tasks to train a student model. The method improves learning efficiency by adapting training example difficulty to the student model's skill level.