A View on Deep Reinforcement Learning in Imperfect Information Games
Many real-world applications can be described as large-scale games of imperfect information. This kind of games is particularly harder than the deterministic one as the search space is even more sizeable. In this paper, I want to explore the power of reinforcement learning in such an environment; that is why I take a look at one of the most popular game of such type, no limit Texas Hold’em Poker, yet unsolved, developing multiple agents with different learning paradigms and techniques and then comparing their respective performances. When applied to no-limit Hold’em Poker, deep reinforcement learning agents clearly outperform agents with a more traditional approach. Moreover, if these last agents rival a human beginner level of play, the ones based on reinforcement learning compare to an amateur human player. The main algorithm uses Fictitious Play in combination with ANNs and some handcrafted metrics. We also applied the main algorithm to another game of imperfect information, less complex than Poker, in order to show the scalability of this solution and the increase in performance when put neck in neck with established classical approaches from the reinforcement learning literature.
 Billings, D., Papp, D., Schaeffer, J., and Szafron, D. Opponent modeling in poker. Aaai/iaai 493 (1998), 499.
 Bowling, M., Burch, N., Johanson, M., and Tammelin, O. Heads-up limit hold’em poker is solved. Science 347, 6218 (2015), 145–149.
 Brown, G. W. Iterative solution of games by fictitious play. Activity analysis of production and allocation 13, 1 (1951), 374–376.
 Brown, N., and Sandholm, T. Superhuman ai for multiplayer poker. Science 365, 6456 (2019), 885–890.
 Heinrich, J., Lanctot, M., and Silver, D. Fictitious self-play in extensive-form games. In International Conference on Machine Learning (2015), pp. 805–813.
 Heinrich, J., and Silver, D. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121 (2016).
 Lambert Iii, T. J., Epelman, M. A., and Smith, R. L. A fictitious play approach to large-scale optimization. Operations Research 53, 3 (2005), 477–489.
 Moravˇc´ık, M., Schmid, M., Burch, N., Lis`y, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., and Bowling, M. Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science 356, 6337 (2017), 508–513.
 Nevmyvaka, Y., Feng, Y., and Kearns, M. Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on Machine learning (2006), pp. 673–680.
 Sewak, M. Deep q network (dqn), double dqn and dueling dqn. In Deep Reinforcement Learning. Springer, 2019, pp. 95–108.
 Shamma, J. S., and Arslan, G. Dynamic fictitious play, dynamic gradient play, and distributed convergence to nash equilibria. IEEE Transactions on Automatic Control 50, 3 (2005), 312–327.
 Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., et al. Mastering the game of go without human knowledge. nature 550, 7676 (2017), 354–359.
 Stipi´c, A., Bronzin, T., Prole, B., and Pap, K. Deep learning advancements: closing the gap. In 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (2019), IEEE, pp. 1087–1092.
 Sutton, R. S., and Barto, A. G. Reinforcement learning: An introduction. MIT press, 2018.
 Tidor-Vlad, P. Deep reinforcement learning in imperfect information games: Texas hold’em poker, 7 2020. Bachelor’s Thesis.
 Urieli, D., and Stone, P. Tactex’13: a champion adaptive power trading agent. In Twenty-Eighth AAAI Conference on Artificial Intelligence (2014).
 Van Hasselt, H., Guez, A., and Silver, D. Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence (2016).
 Vitter, J. S. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS) 11, 1 (1985), 37–57.
 Watkins, C. J., and Dayan, P. Q-learning. Machine learning 8, 3-4 (1992), 279–292.
 Zhang, L., Wang, W., Li, S., and Pan, G. Monte carlo neural fictitious self-play: Approach to approximate nash equilibrium of imperfect-information games. arXiv preprint arXiv:1903.09569 (2019).
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
When the article is accepted for publication, I, as the author and representative of the coauthors, hereby agree to transfer to Studia Universitatis Babes-Bolyai, Series Informatica, all rights, including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author specifically retain: the right to make further copies of all or part of the published article for my use in classroom teaching; the right to reuse all or part of this material in a review or in a textbook of which I am the author; the right to make copies of the published work for internal distribution within the institution that employs me.