Back to feed
arXiv cs.AI·

OSCToM: RL-Guided Adversarial Generation for High-Order Theory of Mind

Signal
78
Hype
25
In three linesOSCToM combines RL and surrogate models to generate observer-agent conflicts in Theory of Mind tasks. On FANToM (information-asymmetric benchmark), OSCToM-8B reaches 76% accuracy vs 0.2% for ExploreToM. Data synthesis is 6x more efficient.
Read source
Your take?
ReasoningReinforcement learningBenchmarksPapers

Summary generated by Claude — human-verified