August 2017

4 articles

OpenAI Baselines: ACKTR & A2C

OpenAI releases two Baselines implementations: A2C (synchronous deterministic variant of A3C) and ACKTR (more sample-efficient RL algorithm than TRPO and A2C, with comparable computational cost per update).

Reinforcement learning Open source OpenAI

SIG

HYP

OpenAI Blog·Aug 16

More on Dota 2

OpenAI demonstrates that self-play catapults ML systems from subhuman to superhuman performance with sufficient compute. Within a month, the system progressed from matching top-ranked players to beating professional pros, with continued improvement. Unlike supervised learning constrained by training data, self-play automatically generates better data as the agent improves.

OpenAI Reinforcement learning Benchmarks

SIG

HYP

OpenAI Blog·Aug 11

Dota 2

OpenAI created a bot that defeats world-class Dota 2 professionals in 1v1 matches under standard tournament rules. The bot learned through self-play without imitation learning or tree search, advancing toward AI systems achieving well-defined goals in complex real-world environments.

OpenAI Reinforcement learning AI Agents

SIG

HYP

OpenAI Blog·Aug 3

Gathering human feedback

OpenAI releases RL-Teacher, an open-source implementation for training AIs via occasional human feedback instead of hand-crafted reward functions. The technique aims to develop safe AI systems and applies to reinforcement learning problems where rewards are hard to specify.

OpenAI Reinforcement learning AI safety

SIG

HYP