A View on Deep Reinforcement Learning in Imperfect Information Games

T.V. Pricope

doi:10.24193/subbi.2020.2.03

T.V. Pricope University of Edinburgh, School of Informatics, Informatics Forum, 10 Crich- ton Street, Edinburgh, UK, EH8 9AB

DOI: https://doi.org/10.24193/subbi.2020.2.03

Abstract

Many real-world applications can be described as large-scale games of imperfect information. This kind of games is particularly harder than the deterministic one as the search space is even more sizeable. In this paper, I want to explore the power of reinforcement learning in such an environment; that is why I take a look at one of the most popular game of such type, no limit Texas Hold’em Poker, yet unsolved, developing multiple agents with different learning paradigms and techniques and then comparing their respective performances. When applied to no-limit Hold’em Poker, deep reinforcement learning agents clearly outperform agents with a more traditional approach. Moreover, if these last agents rival a human beginner level of play, the ones based on reinforcement learning compare to an amateur human player. The main algorithm uses Fictitious Play in combination with ANNs and some handcrafted metrics. We also applied the main algorithm to another game of imperfect information, less complex than Poker, in order to show the scalability of this solution and the increase in performance when put neck in neck with established classical approaches from the reinforcement learning literature.

References

[1] Arulkumaran, K., Cully, A., and Togelius, J. Alphastar: An evolutionary computation perspective. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (2019), pp. 314–315.
[2] Billings, D., Papp, D., Schaeffer, J., and Szafron, D. Opponent modeling in poker. Aaai/iaai 493 (1998), 499.
[3] Bowling, M., Burch, N., Johanson, M., and Tammelin, O. Heads-up limit hold’em poker is solved. Science 347, 6218 (2015), 145–149.
[4] Brown, G. W. Iterative solution of games by fictitious play. Activity analysis of production and allocation 13, 1 (1951), 374–376.
[5] Brown, N., and Sandholm, T. Superhuman ai for multiplayer poker. Science 365, 6456 (2019), 885–890.
[6] Heinrich, J., Lanctot, M., and Silver, D. Fictitious self-play in extensive-form games. In International Conference on Machine Learning (2015), pp. 805–813.
[7] Heinrich, J., and Silver, D. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121 (2016).
[8] Lambert Iii, T. J., Epelman, M. A., and Smith, R. L. A fictitious play approach to large-scale optimization. Operations Research 53, 3 (2005), 477–489.
[9] Moravˇc´ık, M., Schmid, M., Burch, N., Lis`y, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., and Bowling, M. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356, 6337 (2017), 508–513.
[10] Nevmyvaka, Y., Feng, Y., and Kearns, M. Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on Machine learning (2006), pp. 673–680.
[11] Sewak, M. Deep q network (dqn), double dqn and dueling dqn. In Deep Reinforcement Learning. Springer, 2019, pp. 95–108.
[12] Shamma, J. S., and Arslan, G. Dynamic fictitious play, dynamic gradient play, and distributed convergence to nash equilibria. IEEE Transactions on Automatic Control 50, 3 (2005), 312–327.
[13] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354–359.
[14] Stipi´c, A., Bronzin, T., Prole, B., and Pap, K. Deep learning advancements: closing the gap. In 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (2019), IEEE, pp. 1087–1092.
[15] Sutton, R. S., and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018.
[16] Tidor-Vlad, P. Deep reinforcement learning in imperfect information games: Texas hold’em poker, 7 2020. Bachelor’s Thesis.
[17] Urieli, D., and Stone, P. Tactex’13: a champion adaptive power trading agent. In Twenty-Eighth AAAI Conference on Artificial Intelligence (2014).
[18] Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence (2016).
[19] Vitter, J. S. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37–57.
[20] Watkins, C. J., and Dayan, P. Q-learning. Machine learning 8, 3-4 (1992), 279–292.
[21] Zhang, L., Wang, W., Li, S., and Pan, G. Monte carlo neural fictitious self-play: Approach to approximate nash equilibrium of imperfect-information games. arXiv preprint arXiv:1903.09569 (2019).