Back to feed
arXiv cs.AI·

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Signal
78
Hype
25
In three linesStudy reveals a safety vulnerability in personalized dialogue agents: long-term memory biases intent inference and legitimizes harmful queries. PS-Bench benchmark shows personalization increases attack success rates by 15.8%–243.7% versus stateless baselines. A lightweight detection-reflection method is proposed to mitigate this safety degradation.
Read source
Your take?
AI safetyAI AgentsBenchmarksPapersAlignment

Summary generated by Claude — human-verified