Fairbank, M and Alonso, E and Prokhorov, D (2013) An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time. IEEE Transactions on Neural Networks and Learning Systems, 24 (12). pp. 2088-2100. DOI https://doi.org/10.1109/tnnls.2013.2271778
Fairbank, M and Alonso, E and Prokhorov, D (2013) An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time. IEEE Transactions on Neural Networks and Learning Systems, 24 (12). pp. 2088-2100. DOI https://doi.org/10.1109/tnnls.2013.2271778
Fairbank, M and Alonso, E and Prokhorov, D (2013) An Equivalence Between Adaptive Dynamic Programming With a Critic and Backpropagation Through Time. IEEE Transactions on Neural Networks and Learning Systems, 24 (12). pp. 2088-2100. DOI https://doi.org/10.1109/tnnls.2013.2271778
Abstract
We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, VGL(?), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Adaptive Dynamic Programming; Dual Heuristic Programming; Value-Gradient Learning; Backpropagation Through Time; Neural Networks |
Subjects: | B Philosophy. Psychology. Religion > BF Psychology Q Science > QA Mathematics > QA75 Electronic computers. Computer science R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 05 Aug 2016 14:04 |
Last Modified: | 30 Oct 2024 09:21 |
URI: | http://repository.essex.ac.uk/id/eprint/17372 |
Available files
Filename: EQUIV-RCO.pdf