Back to feed
arXiv cs.AI·

MindGames Arena Generalization Track: In2AI Solution with Delayed Per-Step Reward Attribution

Signal
82
Hype
25
In three linesDelayed per-step reward attribution method for training LLM agents in multi-agent strategic interaction. An 8-billion-parameter open-source model trained with this approach matched or surpassed GPT-5 and won both Open and Efficient tracks at MindGames Arena benchmark (NeurIPS 2025).
Read source
Your take?
AI AgentsMulti-agentReinforcement learningBenchmarks

Summary generated by Claude — human-verified