Back to feed
arXiv cs.AI·

Beyond the Cartesian Illusion: Testing Two-Stage Multi-Modal Theory of Mind under Perceptual Bottlenecks

Signal
72
Hype
28
In three linesarXiv paper on spatial limitations of MLLMs in multi-agent environments. Models suffer from a "Cartesian Illusion": lack grounded 3D topological understanding. Authors propose an Epistemic Sensory Bottleneck module with Anchor-Based Embodied Spatial Decomposition CoT to improve second-order spatial inference (Theory of Mind). Zero-shot baseline: 42% accuracy.
Read source
Your take?
VisionMulti-agentReasoningBenchmarks

Summary generated by Claude — human-verified