Back to feed
OpenAI Blog·

Learning to summarize with human feedback

Signal
75
Hype
25
In three linesOpenAI trains language models for summarization using reinforcement learning from human feedback (RLHF). The approach improves the quality of generated summaries.
Read source
Your take?
OpenAIReinforcement learningAlignment

Summary generated by Claude — human-verified