Back to feed
arXiv cs.AI·

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Signal
78
Hype
15
In three linesOmniToM is a benchmark evaluating theory of mind in LLMs through explicit belief modeling. Built on 895 stories (22,343 annotated belief propositions), it tests extraction and labeling of mental states across 7 dimensions. Results show current LLMs struggle to transform narrative facts into actors' beliefs and shared mental states.
Read source
Your take?
BenchmarksReasoningEvals

Summary generated by Claude — human-verified