Abdullahi, Aisha A and Lucas, Simon M (2011) Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG), 2011-08-31 - 2011-09-03.
Abdullahi, Aisha A and Lucas, Simon M (2011) Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG), 2011-08-31 - 2011-09-03.
Abdullahi, Aisha A and Lucas, Simon M (2011) Temporal difference learning with interpolated n-tuples: Initial results from a simulated car racing environment. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG), 2011-08-31 - 2011-09-03.
Abstract
Evolutionary algorithms have been used successfully in car racing game competitions, such as the ones based on TORCS. This is in contrast to temporal difference learning (TDL), which despite being a powerful learning algorithm, has not been used to any significant extent within these competitions. We believe that this is mainly due to the difficulty of choosing a good function approximator, the potential instability of the learning behavior (and hence the reliability of the results), and the lack of a forward model which restricts the choice of TDL algorithms. This paper reports our initial results on using a new type of function approximator designed to be used with TDL for problems with a large number of continuous-valued inputs, where function approximators such as multi-layer perceptrons can be unstable. The approach combines interpolated tables with n-tuple systems. In order to conduct the research in a flexible and efficient way we developed a new car-racing simulator that runs much more quickly than TORCS and gives us full access to the forward model of the system. We investigate different types of tracks and physics models, and also make comparisons with human drivers and some initial tests with evolutionary learning (EL). The results show that each approach leads to different driving styles, and either TDL or EL can learn best depending on the details of the environment. Significantly, TDL produced best results when learning state-action values (similar to Q-learning; no forward model needed). Regarding driving style, TDL consistently learned behaviours that avoid damage while EL tended to evolve fast but reckless drivers. © 2011 IEEE.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Published proceedings: 2011 IEEE Conference on Computational Intelligence and Games, CIG 2011 |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 19 Oct 2012 21:40 |
Last Modified: | 30 Oct 2024 19:54 |
URI: | http://repository.essex.ac.uk/id/eprint/4116 |