TY - GEN
T1 - Expediting Self-Play Learning in AlphaZero-Style Game-Playing Agents
AU - Björnsson, Yngvi
AU - Jónsson, Róbert Leó Pormar
AU - Jónsson, Sigurjón Ingi
N1 - Publisher Copyright: © 2023 The Authors.
PY - 2023
Y1 - 2023
N2 - One of the main appeals of AlphaZero-style game-playing agents, which combine deep learning and Monte Carlo Tree Search, is that they can be trained autonomously without external expert-level domain knowledge. However, training such agents is generally computationally expensive, with the most computationally time-consuming step being generating training data via self-play. Here we propose an improved strategy for generating self-play training data, resulting in higher-quality samples, especially in earlier training phases. The new strategy initially emphasizes the latter game phases and gradually extends those phases to entire games as the training progresses. In our test domains, the games Connect4 and Breakthrough, we show that game-playing agents using the improved training approach learn significantly faster than counterpart agents using a standard approach. Furthermore, we empirically show that the proposed strategy is (in our test domains) superior to several recently proposed strategies for expediting self-play learning in game playing.
AB - One of the main appeals of AlphaZero-style game-playing agents, which combine deep learning and Monte Carlo Tree Search, is that they can be trained autonomously without external expert-level domain knowledge. However, training such agents is generally computationally expensive, with the most computationally time-consuming step being generating training data via self-play. Here we propose an improved strategy for generating self-play training data, resulting in higher-quality samples, especially in earlier training phases. The new strategy initially emphasizes the latter game phases and gradually extends those phases to entire games as the training progresses. In our test domains, the games Connect4 and Breakthrough, we show that game-playing agents using the improved training approach learn significantly faster than counterpart agents using a standard approach. Furthermore, we empirically show that the proposed strategy is (in our test domains) superior to several recently proposed strategies for expediting self-play learning in game playing.
UR - https://www.scopus.com/pages/publications/85175789625
UR - https://iris.ru.is/ws/files/217767220/FAIA-372-FAIA230279.pdf
U2 - 10.3233/FAIA230279
DO - 10.3233/FAIA230279
M3 - Conference contribution
T3 - Frontiers in Artificial Intelligence and Applications
SP - 263
EP - 270
BT - ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings
A2 - Gal, Kobi
A2 - Nowe, Ann
A2 - Nalepa, Grzegorz J.
A2 - Fairstein, Roy
A2 - Radulescu, Roxana
PB - IOS Press BV
T2 - 26th European Conference on Artificial Intelligence, ECAI 2023
Y2 - 30 September 2023 through 4 October 2023
ER -