Runarsson, Thomas Philip and Lucas, Simon M (2012) Imitating play from game trajectories: Temporal difference learning versus preference learning. In: 2012 IEEE Conference on Computational Intelligence and Games (CIG), 2012-09-11 - 2012-09-14.
Runarsson, Thomas Philip and Lucas, Simon M (2012) Imitating play from game trajectories: Temporal difference learning versus preference learning. In: 2012 IEEE Conference on Computational Intelligence and Games (CIG), 2012-09-11 - 2012-09-14.
Runarsson, Thomas Philip and Lucas, Simon M (2012) Imitating play from game trajectories: Temporal difference learning versus preference learning. In: 2012 IEEE Conference on Computational Intelligence and Games (CIG), 2012-09-11 - 2012-09-14.
Abstract
This work compares the learning of linear evaluation functions using preference learning versus least squares temporal difference learning, LSTD(λ), from samples of game trajectories. The game trajectories are taken from human competitions held by the French Othello Federation1. The raw board positions are used to create a linear evaluation function to illustrate the key difference between the two learning approaches. The results show that the policies learned, using exactly the same game trajectories, can be quite different. For the simple set of features used, preference learning produces policies that better capture the behaviour of expert players, and also lead to higher levels of play when compared to LSTD(λ). © 2012 IEEE.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Published proceedings: 2012 IEEE Conference on Computational Intelligence and Games, CIG 2012 |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 19 Oct 2012 22:52 |
Last Modified: | 24 Oct 2024 21:47 |
URI: | http://repository.essex.ac.uk/id/eprint/4125 |