Research Repository

Imitating play from game trajectories: Temporal difference learning versus preference learning

Runarsson, TP and Lucas, SM (2012) Imitating play from game trajectories: Temporal difference learning versus preference learning. In: UNSPECIFIED, ? - ?.

Full text not available from this repository.

Abstract

This work compares the learning of linear evaluation functions using preference learning versus least squares temporal difference learning, LSTD(λ), from samples of game trajectories. The game trajectories are taken from human competitions held by the French Othello Federation1. The raw board positions are used to create a linear evaluation function to illustrate the key difference between the two learning approaches. The results show that the policies learned, using exactly the same game trajectories, can be quite different. For the simple set of features used, preference learning produces policies that better capture the behaviour of expert players, and also lead to higher levels of play when compared to LSTD(λ). © 2012 IEEE.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Published proceedings: 2012 IEEE Conference on Computational Intelligence and Games, CIG 2012
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Jim Jamieson
Date Deposited: 19 Oct 2012 22:52
Last Modified: 26 Jun 2018 13:15
URI: http://repository.essex.ac.uk/id/eprint/4125

Actions (login required)

View Item View Item