Research Repository

Developing Event Identification Methods for Structured and Unstructured Data Streams

Alkhamees, Nora (2019) Developing Event Identification Methods for Structured and Unstructured Data Streams. PhD thesis, University of Essex.

[img]
Preview
Text
Nora Alkhamees.pdf

Download (7MB) | Preview

Abstract

Data, now more than ever before, are continuously being generated in huge volumes, andat rapid speed. Data may originate from various sources, for instance: sensor readings,financial transactions, social networks, etc.. A data stream is a continuous sequence ofdata arriving in almost real-time and often at a high speed. In this thesis, we are interested in benefiting from the availability of such data anddeveloping methods for detecting the occurrence of events from data streams, such as atext stream and a price time-series stream. Hence, we have explored event identificationfrom structured and unstructured data streams in the domain of finance. We employ the Directional Change (DC) approach to high frequency time-seriesstreams to identify significant price transitions (i.e. events). DC is an event-basedapproach for summarizing price movements based on a fixed, a-priori threshold. Wepropose a dynamic threshold definition method, which replaces the fixed threshold andis appropriate for markets that operate over specific opening and closing times. Adynamic threshold provides more flexibility and extends the DC approach allowing theidentification of price changes in continuously changing environments. With the proliferation of social media data reporting on all aspects of human activ-ity, being able to automatically identify events is becoming increasingly important. Wepresent a framework for detecting the occurring events on a daily basis, via social net-work streams. We develop and extend a Frequent Pattern Mining method by proposinga dynamic support definition method to replace the fixed support. As the number oftext posts streamed each day is not fixed, a dynamic support, can adapt to the natureof data streams and can improve the identification of events. Finally, we explore whether we can bring together the insights from the time-seriesstream and the social network stream to understand if events as identified from bothstreams can be correlated.

Item Type: Thesis (PhD)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Nora Alkhamees
Date Deposited: 05 Jul 2019 08:49
Last Modified: 05 Jul 2019 08:51
URI: http://repository.essex.ac.uk/id/eprint/24934

Actions (login required)

View Item View Item