Imitating play from game trajectories: Temporal difference learning versus preference learning

Runarsson, Thomas Philip and Lucas, Simon M (2012) Imitating play from game trajectories: Temporal difference learning versus preference learning. In: 2012 IEEE Conference on Computational Intelligence and Games (CIG), 2012-09-11 - 2012-09-14.

Abstract

This work compares the learning of linear evaluation functions using preference learning versus least squares temporal difference learning, LSTD(λ), from samples of game trajectories. The game trajectories are taken from human competitions held by the French Othello Federation<sup>1</sup>. The raw board positions are used to create a linear evaluation function to illustrate the key difference between the two learning approaches. The results show that the policies learned, using exactly the same game trajectories, can be quite different. For the simple set of features used, preference learning produces policies that better capture the behaviour of expert players, and also lead to higher levels of play when compared to LSTD(λ). © 2012 IEEE.

Item Metadata

Item Type:	Conference or Workshop Item (Paper)
Additional Information:	Published proceedings: 2012 IEEE Conference on Computational Intelligence and Games, CIG 2012
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor:	Unnamed user with email elements@essex.ac.uk
Depositing User:	Unnamed user with email elements@essex.ac.uk
Date Deposited:	19 Oct 2012 22:52
Last Modified:	17 Jun 2025 13:45
URI:	http://repository.essex.ac.uk/id/eprint/4125

Imitating play from game trajectories: Temporal difference learning versus preference learning

Abstract

Item Metadata

Share and export

Available files

Statistics

Altmetrics

Downloads