EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent
Signal
82
Hype
15
In three linesEComAgentBench is a benchmark of 662 e-commerce tasks evaluating LLM-based shopping agents on hidden intents distributed across query, user profile, and clarifications. Requirements are scattered and agents must uncover them within 100 tool calls. The strongest model achieves only 57.1% accuracy.Read source
Your take?
Summary generated by Claude — human-verified