OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind
Signal
78
Hype
25
In three linesOSCToM combines RL and surrogate models to generate observer-agent conflicts in Theory of Mind tasks. On FANToM (information-asymmetric benchmark), OSCToM-8B reaches 76% accuracy vs 0.2% for ExploreToM. Data synthesis is 6x more efficient.Read source
Your take?
Summary generated by Claude — human-verified