Back to feed
arXiv cs.CL·

Alignment Drift in Long-Term Human-LLM Interaction: A Mechanism-Oriented Framework

Signal
72
Hype
18
In three linesStudy on 'alignment drift': gradual process where LLM outputs become less constrained by user's current message and more shaped by interaction history, while remaining helpful. Mechanism-oriented framework distinguishes signal A/B, feedback loops, and interactive regimes to control this cumulative drift.
Read source
Your take?
AlignmentAI AgentsAI safety

Summary generated by Claude — human-verified