Kihlman, Ragini (2023) Improving classification through co-training approaches. Doctoral thesis, University of Essex.
Kihlman, Ragini (2023) Improving classification through co-training approaches. Doctoral thesis, University of Essex.
Kihlman, Ragini (2023) Improving classification through co-training approaches. Doctoral thesis, University of Essex.
Abstract
Machine learning methods have found applications in fields as diverse as finance, health care, aviation, and social networking. Such methods rely on the availability of data and in particular labelled datasets to train algorithms to provide predictions. However, labelled datasets are difficult to source as annotating data is an expensive and time-consuming process. This thesis is focusing on improving the prediction for classification tasks through building on co-training which is a semi-supervised learning approach that trains two classifiers based on two different views of the data. We present an incremental approach to extending the co-training algorithm and deal with the lack of sufficiently labelled data. We develop a multi-label classification model that would classify the data collected by a domain-specific term extraction method based on Blum’s co-training binary classification model. To train the classifier in the domain of human rights, the method uses a labeled set compiled by experts as background knowledge. In the next step, we propose a method for randomly dividing the available features for applying matrix factorization so as to discover latent features underlying interactions between different kinds of entities present in a single view dataset. Using matrix factorization and similarity measures, the next method co-trains a large corpus of unstructured data to correctly classify it. This method evaluates each label and recommends documents with similar labels. In our final step, we implement two neural networks using two-view semi-supervised learning for text classification. This concept is extended to deep learning by using deep neural networks to train on different views of generated samples. This will calculate similarity in the probability distribution of predicted outcomes. Furthermore, the method adds noise to prevent it from affecting the classifier during prediction. The accuracy of classification is therefore improved by co-trained networks.
Item Type: | Thesis (Doctoral) |
---|---|
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
Depositing User: | Ragini Gokhale |
Date Deposited: | 02 Feb 2023 11:42 |
Last Modified: | 02 Feb 2023 11:42 |
URI: | http://repository.essex.ac.uk/id/eprint/34744 |
Available files
Filename: PhD_Thesis_Ragini.pdf