Hernandez, Daniel and Denamganai, Kevin and Devlin, Sam and Samothrakis, Spyridon and Walker, James Alfred (2022) A Comparison of Self-Play Algorithms Under a Generalized Framework. IEEE Transactions on Games, 14 (2). pp. 221-231. DOI https://doi.org/10.1109/tg.2021.3058898
Hernandez, Daniel and Denamganai, Kevin and Devlin, Sam and Samothrakis, Spyridon and Walker, James Alfred (2022) A Comparison of Self-Play Algorithms Under a Generalized Framework. IEEE Transactions on Games, 14 (2). pp. 221-231. DOI https://doi.org/10.1109/tg.2021.3058898
Hernandez, Daniel and Denamganai, Kevin and Devlin, Sam and Samothrakis, Spyridon and Walker, James Alfred (2022) A Comparison of Self-Play Algorithms Under a Generalized Framework. IEEE Transactions on Games, 14 (2). pp. 221-231. DOI https://doi.org/10.1109/tg.2021.3058898
Abstract
The notion of self-play, albeit often cited in multiagent Reinforcement Learning as a process by which to train agent policies from scratch, has received little efforts to be taxonomized within a formal model. We present a formalized framework, with clearly defined assumptions, which encapsulates the meaning of self-play as abstracted from various existing self-play algorithms. This framework is framed as an approximation to a theoretical solution concept for multiagent training. Through a novel qualitative visualization metric, on a simple environment, we show that different self-play algorithms generate different distributions of episode trajectories, leading to different explorations of the policy space by the learning agents. Quantitatively, on two environments, we analyze the learning dynamics of policies trained under different self-play algorithms captured under our framework and perform cross self-play performance comparisons. Our results indicate that, throughout training, various widely used self-play algorithms exhibit cyclic policy evolutions and that the choice of self-play algorithm significantly affects the final performance of trained agents.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Training; Games; Measurement; Statistics; Sociology; Heuristic algorithms; Reinforcement learning; Emergent phenomena; machine learning; multi-agent systems |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 13 Oct 2021 14:04 |
Last Modified: | 30 Oct 2024 16:39 |
URI: | http://repository.essex.ac.uk/id/eprint/31006 |
Available files
Filename: 2006.04471v1.pdf