Research Repository

An equivalence between adaptive dynamic programming with a critic and backpropagation through time

Fairbank, M and Alonso, E and Prokhorov, D (2013) 'An equivalence between adaptive dynamic programming with a critic and backpropagation through time.' IEEE Transactions on Neural Networks and Learning Systems, 24 (12). 2088 - 2100. ISSN 2162-237X

[img]
Preview
Text
EQUIV-RCO.pdf - Accepted Version

Download (358kB) | Preview

Abstract

We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), which is designed to learn a critic function, when using learned model functions of the environment. DHP is designed for optimizing control problems in large and continuous state spaces. We extend DHP into a new algorithm that we call Value-Gradient Learning, {\rm VGL}(\lambda), and prove equivalence of an instance of the new algorithm to Backpropagation Through Time for Control with a greedy policy. Not only does this equivalence provide a link between these two different approaches, but it also enables our variant of DHP to have guaranteed convergence, under certain smoothness conditions and a greedy policy, when using a general smooth nonlinear function approximator for the critic. We consider several experimental scenarios including some that prove divergence of DHP under a greedy policy, which contrasts against our proven-convergent algorithm. © 2012 IEEE.

Item Type: Article
Subjects: B Philosophy. Psychology. Religion > BF Psychology
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > RC Internal medicine > RC0321 Neuroscience. Biological psychiatry. Neuropsychiatry
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Jim Jamieson
Date Deposited: 05 Aug 2016 14:04
Last Modified: 30 Jan 2019 16:23
URI: http://repository.essex.ac.uk/id/eprint/17372

Actions (login required)

View Item View Item