OpenAI Blog·13 June 2017

Learning from human preferences

Signal

Hype

In three linesOpenAI and DeepMind develop a preference learning algorithm to infer human objectives without explicit reward functions, reducing risks of undesirable AI behaviors.

Read source

Your take?

OpenAI DeepMind Reinforcement learning Alignment AI safety

Summary generated by Claude — human-verified

Learning from human preferences

Other angles on this story