arXiv cs.AI·27 May 2026

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Signal

Hype

In three linesOmniToM is a benchmark evaluating theory of mind in LLMs through explicit belief modeling. Built on 895 stories (22,343 annotated belief propositions), it tests extraction and labeling of mental states across 7 dimensions. Results show current LLMs struggle to transform narrative facts into actors' beliefs and shared mental states.

Read source

Your take?

Benchmarks Reasoning Evals

Summary generated by Claude — human-verified

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Other angles on this story