Back to feed
arXiv cs.CL·

Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs

Signal
78
Hype
15
In three linesStudy of instruction-following vs. pattern-completion conflict across 13 LLMs. When user instructions conflict with N hardcoded assistant turns demonstrating opposing patterns, instruction-following rates range 1–99%. Transition is universal but model-dependent. Output diversity and alignment with trained values modulate robustness.
Read source
Your take?
ReasoningAlignmentEvalsAI safety

Summary generated by Claude — human-verified