EnactToM: An Evolving Benchmark for Functional Theory of Mind in Embodied Agents
Signal
78
Hype
25
In three linesEnactToM is an evolving benchmark with 300 multi-agent embodied tasks in 3D household environments with partial observability. It tests functional Theory of Mind—acting optimally on implicit beliefs—rather than literal belief questions. All seven frontier models score 0.0% on hard task completion, with 93% of failures traced to epistemic coordination breakdowns.Read source
Your take?
Summary generated by Claude — human-verified