arXiv cs.AI·17 June 2026

EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent

Signal

Hype

In three linesEComAgentBench is a benchmark of 662 e-commerce tasks evaluating LLM-based shopping agents on hidden intents distributed across query, user profile, and clarifications. Requirements are scattered and agents must uncover them within 100 tool calls. The strongest model achieves only 57.1% accuracy.

Read source

Your take?

AI Agents Benchmarks Evals

Summary generated by Claude — human-verified

EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent

Other angles on this story