Yilmaz, Ahmet (2021) Uncovering Efficient Learning and Initialisation Algorithms for Neural Networks Using Evolutionary Algorithms and Theoretical Analyses. PhD thesis, University of Essex.
Yilmaz, Ahmet (2021) Uncovering Efficient Learning and Initialisation Algorithms for Neural Networks Using Evolutionary Algorithms and Theoretical Analyses. PhD thesis, University of Essex.
Yilmaz, Ahmet (2021) Uncovering Efficient Learning and Initialisation Algorithms for Neural Networks Using Evolutionary Algorithms and Theoretical Analyses. PhD thesis, University of Essex.
Abstract
Artificial Neural Networks (ANNs) are one of the most widely used form of machine learning algorithms. Over the years numerous types of ANN have been developed and applied to many domains. However, there are still important problems to overcome including their slow learning and the inability of certain types of deep ANNs to learn, due to the vanishing gradient problem. This thesis attempted to solve these problems via novel efficient learning and initialisation algorithms. One of the tools used to do this is Genetic Programming (GP): a form of program evolution. Very little research had been done on the use of GP to induce learning rules for ANNs. This thesis started from where others left and also developed a rigorous methodology for fairly comparing learning rules. GP was able to evolve a learning rule that is fast and general. A qualitative interpretation for the rule and empirical evidence showed it is superior to the standard back-propagation algorithm. The vanishing gradient problem is a long-standing obstacle to the training of deep ANNs using sigmoid activation functions. The methods proposed in the literature to improve the situation are not very successful. This thesis first used GP to discover an initialisation algorithm that solve the problem. Then, we performed an in-depth analysis of the evolved algorithm and a theoretical analysis of the extent to which the vanishing gradient problem depends on the choice of the mean of the initial weight distribution. Both indicated that initialising the weights with a carefully selected negative mean would give large initial gradients in weight space. Empirical verification finally showed that starting from such a good initial position, the standard back-propagation algorithm is successful and efficient at training deep networks with 10 and 15 hidden layers on a standard set of benchmark problems.
Item Type: | Thesis (PhD) |
---|---|
Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
Depositing User: | Ahmet Yilmaz |
Date Deposited: | 10 Mar 2021 16:43 |
Last Modified: | 09 Mar 2024 02:00 |
URI: | http://repository.essex.ac.uk/id/eprint/30013 |
Available files
Filename: Final_thesis_YILMAZ.pdf