Back to feed
arXiv cs.AI·

MAPLE: Multi-State Aggregated Policy Evaluation for AlphaZero in Imperfect-Information Games

Signal
72
Hype
18
In three linesMAPLE, a tree search method, extends AlphaZero to imperfect-information games by aggregating policy and value evaluations from multiple sampled world states. Tested on Phantom Go and Dark Hex, MAPLE outperforms PIMC-AlphaZero baseline with Elo gains of 291 and 136.
Read source
Your take?
ReasoningReinforcement learningBenchmarks

Summary generated by Claude — human-verified