arXiv cs.LG·29 May 2026

Self-Play Reinforcement Learning under Imperfect Information in Big 2

Signal

Hype

In three linesSelf-play RL study in Big 2, a four-player imperfect-information card game. PPO outperforms Q-learning, SARSA, and Monte Carlo Q-approximation against random, greedy, and heuristic opponents. Moderate entropy regularization and current-policy self-play improve performance in this controlled multiplayer setting.

Read source

Your take?

Reinforcement learning Multi-agent Benchmarks

Summary generated by Claude — human-verified

Self-Play Reinforcement Learning under Imperfect Information in Big 2

Other angles on this story