Al-Ani, Jabir Alshehabi and Fasli, Maria (2019) Probabilistic Relational Supervised Topic Modelling using Word Embeddings. In: 2018 IEEE International Conference on Big Data (Big Data), 2018-12-10 - 2018-12-13, Seattle, WA, USA.
Al-Ani, Jabir Alshehabi and Fasli, Maria (2019) Probabilistic Relational Supervised Topic Modelling using Word Embeddings. In: 2018 IEEE International Conference on Big Data (Big Data), 2018-12-10 - 2018-12-13, Seattle, WA, USA.
Al-Ani, Jabir Alshehabi and Fasli, Maria (2019) Probabilistic Relational Supervised Topic Modelling using Word Embeddings. In: 2018 IEEE International Conference on Big Data (Big Data), 2018-12-10 - 2018-12-13, Seattle, WA, USA.
Abstract
The increasing pace of change in languages affects many applications and algorithms for text processing. Researchers in Natural Language Processing (NLP) have been striving for more generalized solutions that can cope with continuous change. This is even more challenging when applied on short text emanating from social media. Furthermore, increasingly social media have been casting a major influence on both the development and the use of language. Our work is motivated by the need to develop NLP techniques that can cope with short informal text as used in social media alongside the massive proliferation of textual data uploaded daily on social media. In this paper, we describe a novel approach for Short Text Topic Modelling using word embeddings and taking into account any informality of words in the social media text with the aim of addressing the challenge of reducing noise in messy text. We present a new algorithm derived from the Term Frequency -Inverse Document Frequency (TF-IDF), named Term Frequency - Inverse Context Term Frequency (TF-ICTF). TF-ICTF relies on a probabilistic relation between words and context with respect to time. Our experimental work shows promising results against other state-of-the-art methods.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Notes: file: :C\:/Users/Jabir/Desktop/Research FOlders/Jabir Papers/Published Papers IEEE copy/08622326.pdf:pdf keywords: embeddings,short text,term frequency,tf-idf,topic modeling,words matching |
Uncontrolled Keywords: | Topic Modeling; Term Frequency; Embeddings; TF-IDF; Short Text; Words Matching |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Mathematics, Statistics and Actuarial Science, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 19 Mar 2019 12:05 |
Last Modified: | 06 Nov 2024 06:21 |
URI: | http://repository.essex.ac.uk/id/eprint/24228 |
Available files
Filename: fasli_big_data_01.pdf