OpenAI Baselines: ACKTR & A2C
OpenAI releases two Baselines implementations: A2C (synchronous deterministic variant of A3C) and ACKTR (more sample-efficient RL algorithm than TRPO and A2C, with comparable computational cost per update).
4 articles
OpenAI releases two Baselines implementations: A2C (synchronous deterministic variant of A3C) and ACKTR (more sample-efficient RL algorithm than TRPO and A2C, with comparable computational cost per update).
OpenAI demonstrates that self-play catapults ML systems from subhuman to superhuman performance with sufficient compute. Within a month, the system progressed from matching top-ranked players to beating professional pros, with continued improvement. Unlike supervised learning constrained by training data, self-play automatically generates better data as the agent improves.
OpenAI created a bot that defeats world-class Dota 2 professionals in 1v1 matches under standard tournament rules. The bot learned through self-play without imitation learning or tree search, advancing toward AI systems achieving well-defined goals in complex real-world environments.
OpenAI releases RL-Teacher, an open-source implementation for training AIs via occasional human feedback instead of hand-crafted reward functions. The technique aims to develop safe AI systems and applies to reinforcement learning problems where rewards are hard to specify.