OpenAI Blog·19 April 2024

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Signal

Hype

In three linesOpenAI introduces an instruction hierarchy to train LLMs to prioritize privileged instructions and resist prompt injections and jailbreaks. The method enables models to distinguish system directives from malicious user inputs.

Read source

Your take?

OpenAI AI safety Alignment Prompt engineering

Summary generated by Claude — human-verified

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Other angles on this story