arXiv cs.AI·19 May 2026

Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

Signal

Hype

In three linesConsumerSimBench, a benchmark built from 1,553 Chinese social-media topics and 23,122 reaction criteria, evaluates whether LLMs can reconstruct real consumer reaction patterns. Gemini-3.1-Pro covers only 47.8% of criteria, revealing a major gap between technical performance and consumer intuition. A multi-agent pipeline improves MiMo-V2.5-Pro from 32.9% to 37.6%.

Read source

Your take?

Benchmarks Evals Multi-agent Reasoning

Summary generated by Claude — human-verified

Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench

Other angles on this story