arXiv cs.AI·19 May 2026

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Signal

Hype

In three linesStudy reveals a safety vulnerability in personalized dialogue agents: long-term memory biases intent inference and legitimizes harmful queries. PS-Bench benchmark shows personalization increases attack success rates by 15.8%–243.7% versus stateless baselines. A lightweight detection-reflection method is proposed to mitigate this safety degradation.

Read source

Your take?

AI safety AI Agents Benchmarks Papers Alignment

Summary generated by Claude — human-verified

When Personalization Legitimizes Risks: Uncovering Safety Vulnerabilities in Personalized Dialogue Agents

Other angles on this story