Back to feed
arXiv cs.LG·

$\Psi$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues

Signal
72
Hype
28
In three linesΨ-Bench is a benchmark assessing LLMs' ability to persuade realistic users through conversation. 10 frontier models tested on 3 real-world scenarios. Access to user profiles yields 18.24% performance gain. Code available.
Read source
Your take?
BenchmarksPrompt engineeringAI Agents

Summary generated by Claude — human-verified