MirrorBench: A Benchmark to Evaluate Conversational User-Proxy Agents for Human-Likeness
Signal
75
Hype
15
In three linesMirrorBench is a benchmarking framework to evaluate user-proxy agents in conversational systems. It combines 6 metrics (MATTR, Yule's K, HD-D, GTEval, Pairwise Indistinguishability, Rubric-and-Reason) to measure realism of LLM-generated user utterances across 4 public datasets. Open-source code released.Read source
Your take?
Summary generated by Claude — human-verified