Back to feed
arXiv cs.LG·

Self-Play Reinforcement Learning under Imperfect Information in Big 2

Signal
72
Hype
15
In three linesSelf-play RL study in Big 2, a four-player imperfect-information card game. PPO outperforms Q-learning, SARSA, and Monte Carlo Q-approximation against random, greedy, and heuristic opponents. Moderate entropy regularization and current-policy self-play improve performance in this controlled multiplayer setting.
Read source
Your take?
Reinforcement learningMulti-agentBenchmarks

Summary generated by Claude — human-verified