arXiv cs.AI·22 May 2026

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

Signal

Hype

In three linesOSCToM combines RL and surrogate models to generate observer-agent conflicts in Theory of Mind tasks. On FANToM (information-asymmetric benchmark), OSCToM-8B reaches 76% accuracy vs 0.2% for ExploreToM. Data synthesis is 6x more efficient.

Read source

Your take?

Reasoning Reinforcement learning Benchmarks Papers

Summary generated by Claude — human-verified

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

Other angles on this story