Research Repository

Exploring embedding vectors for emotion detection

Alshahrani, Mohammed (2020) Exploring embedding vectors for emotion detection. PhD thesis, University of Essex.

[img]
Preview
Text
Mohammed_Alshahrani.pdf

Download (4MB) | Preview

Abstract

Textual data nowadays is being generated in vast volumes. With the proliferation of social media and the prevalence of smartphones, short texts have become a prevalent form of information such as news headlines, tweets and text advertisements. Given the huge volume of short texts available, effective and efficient models to detect the emotions from short texts become highly desirable and in some cases fundamental to a range of applications that require emotion understanding of textual content, such as human computer interaction, marketing, e-learning and health. Emotion detection from text has been an important task in Natural Language Processing (NLP) for many years. Many approaches have been based on the emotional words or lexicons in order to detect emotions. While the word embedding vectors like Word2Vec have been successfully employed in many NLP approaches, the word mover’s distance (WMD) is a method introduced recently to calculate the distance between two documents based on the embedded words. This thesis is investigating the ability to detect or classify emotions in sentences using word vectorization and distance measures. Our results confirm the novelty of using Word2Vec and WMD in predicting the emotions in short text. We propose a new methodology based on identifying “idealised” vectors that cap- ture the essence of an emotion; we define these vectors as having the minimal distance (using some metric function) between a vector and the embeddings of the text that contains the relevant emotion (e.g. a tweet, a sentence). We look for these vectors through searching the space of word embeddings using the covariance matrix adap- tation evolution strategy (CMA-ES). Our method produces state of the art results, surpassing classic supervised learning methods.

Item Type: Thesis (PhD)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Mohammed Alshahrani
Date Deposited: 06 Nov 2020 10:24
Last Modified: 06 Nov 2020 10:24
URI: http://repository.essex.ac.uk/id/eprint/29037

Actions (login required)

View Item View Item