Fairbank, Michael and Alonso, Eduardo (2012) Value-gradient learning. In: The 2012 International Joint Conference on Neural Networks (IJCNN), 2012-06-10 - 2012-06-15, Brisbane, QLD, Australia.
Fairbank, Michael and Alonso, Eduardo (2012) Value-gradient learning. In: The 2012 International Joint Conference on Neural Networks (IJCNN), 2012-06-10 - 2012-06-15, Brisbane, QLD, Australia.
Fairbank, Michael and Alonso, Eduardo (2012) Value-gradient learning. In: The 2012 International Joint Conference on Neural Networks (IJCNN), 2012-06-10 - 2012-06-15, Brisbane, QLD, Australia.
Abstract
We describe an Adaptive Dynamic Programming algorithm VGL (λ) for learning a critic function over a large continuous state space. The algorithm, which requires a learned model of the environment, extends Dual Heuristic Dynamic Programming to include a bootstrapping parameter analogous to that used in the reinforcement learning algorithm TD(λ). We provide on-line and batch mode implementations of the algorithm, and summarise the theoretical relationships and motivations of using this method over its precursor algorithms Dual Heuristic Dynamic Programming and TD (λ). Experiments for control problems using a neural network and greedy policy are provided.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Published proceedings: The 2012 International Joint Conference on Neural Networks (IJCNN) |
Uncontrolled Keywords: | Value-Gradient Learning; Dual Heuristic Dynamic Programming; DHP; Adaptive Dynamic Programming |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 14 Apr 2021 13:38 |
Last Modified: | 25 Oct 2024 03:57 |
URI: | http://repository.essex.ac.uk/id/eprint/21299 |
Available files
Filename: PID2286117.pdf