Can LLMs Think Like Consumers? Benchmarking Crowd-Level Reaction Reconstruction with ConsumerSimBench
Signal
78
Hype
25
In three linesConsumerSimBench, a benchmark built from 1,553 Chinese social-media topics and 23,122 reaction criteria, evaluates whether LLMs can reconstruct real consumer reaction patterns. Gemini-3.1-Pro achieves only 47.8% coverage of criteria, revealing a major gap between technical performance and socially grounded consumer intuition.Read source
Your take?
Summary generated by Claude — human-verified