Back to feed
arXiv cs.AI·

EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents

Signal
78
Hype
25
In three linesEgoBench is an interactive multimodal benchmark for tool-using agents with 1,045 egocentric-video tasks across four daily scenarios. Eight SOTA video-MLLMs achieve only 30.62% accuracy at best, 19.43% average, exposing bottlenecks in visual perception and multi-hop reasoning.
Read source
Your take?
AI AgentsVisionBenchmarksMulti-agent

Summary generated by Claude — human-verified