Robles, David and Rohlfshagen, Philipp and Lucas, Simon M (2011) Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG), 2011-08-31 - 2011-09-03.
Robles, David and Rohlfshagen, Philipp and Lucas, Simon M (2011) Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG), 2011-08-31 - 2011-09-03.
Robles, David and Rohlfshagen, Philipp and Lucas, Simon M (2011) Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG), 2011-08-31 - 2011-09-03.
Abstract
Monte Carlo Tree Search (MCTS) with an appropriate tree policy may be used to approximate a minimax tree for games such as Go, where a state value function cannot be formulated easily: recent MCTS algorithms successfully combine Upper Confidence Bounds for Trees with Monte Carlo (MC) simulations to incrementally refine estimates on the game-theoretic values of the game's states. Although a game-specific value function is not required for this approach, significant improvements in performance may be achieved by derandomising the MC simulations using domain-specific knowledge. However, recent results suggest that the choice of a non-uniformly random default policy is non-trivial and may often lead to unexpected outcomes. In this paper we employ Temporal Difference Learning (TDL) as a general approach to the integration of domain-specific knowledge in MCTS and subsequently study its impact on the algorithm's performance. In particular, TDL is used to learn a linear function approximator that is used as an a priori bias to the move selection in the algorithm's default policy; the function approximator is also used to bias the values of the nodes in the tree directly. The goal of this work is to determine whether such a simplistic approach can be used to improve the performance of MCTS for the well-known board game OTHELLO. The analysis of the results highlights the broader conclusions that may be drawn with respect to non-random default policies in general. © 2011 IEEE.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Published proceedings: 2011 IEEE Conference on Computational Intelligence and Games, CIG 2011 |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 19 Oct 2012 21:39 |
Last Modified: | 30 Oct 2024 19:54 |
URI: | http://repository.essex.ac.uk/id/eprint/4115 |