Research Repository

Developing Learning Methods for Non-stationary and Imbalanced Data Streams

Almuammar, Manal (2020) Developing Learning Methods for Non-stationary and Imbalanced Data Streams. PhD thesis, University of Essex.

[img] Text
AlmuammarThesis.pdf
Restricted to Repository staff only until 22 May 2023.

Download (6MB) | Request a copy

Abstract

Recent developments in technology have enhanced the abilities of systems to both generate and collect data from a variety of sources. There is an increasing number of Internet of Things devices generating continuous data streams rapidly. Mining these data streams brings new opportunities but also introduces new challenges. Learning from these data streams is challenging due to the characteristics of such streams: continuous unbounded high-speed data of an evolving nature which must be processed on the fly. An additional challenge emanates from the fact that many of the data streams generated by real-world applications are imbalanced within themselves. This difficulty is more acute in multi-class learning tasks. Despite "learning from non-stationary streams" and "class imbalance" problems having been investigated separately in the literature, too little attention has been paid to the multi-class imbalance problem as it can emerge in evolving streams. This thesis is devoted to the development of new techniques for mining evolving data streams which have skewed distributions and to tackling the multi-class problem related to such streams. It presents a new method for classifying heterogeneous data streams which extends the current concept drift adaptation techniques so they can deal with imbalanced classes' scenarios. To this end, an adaptive learning algorithm is developed which uses a windows based approach, and which modifies the make-up of the training set to enhance the accuracy of classification. In addition, this research proposes a new method for discovering patterns from evolving data streams with skewed distributions; it introduces a dynamically calculated support threshold; this allows the proposed method to tackle the rare patterns problem as this is encountered in non-stationary streams. Moreover, an experiment is conducted in relation to forecasting time series from heterogeneous data streams using a deep learning approach, to provide real-time parking prediction in a transportation domain.

Item Type: Thesis (PhD)
Subjects: Q Science > Q Science (General)
T Technology > T Technology (General)
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Manal Almuammar
Date Deposited: 01 Jun 2020 07:55
Last Modified: 01 Jun 2020 07:55
URI: http://repository.essex.ac.uk/id/eprint/27600

Actions (login required)

View Item View Item