Research Repository

The QMUL/HRBDT contribution to the NADI Arabic Dialect Identification Shared Task

Aloraini, Abdulrahman and Poesio, Massimo and Alhelbawy, Ayman (2020) The QMUL/HRBDT contribution to the NADI Arabic Dialect Identification Shared Task. In: Fifth Arabic Natural Language Processing Workshop (WANLP 2020), Barcelona, Spain.

[img]
Preview
Text
2020.wanlp-1.31.pdf - Published Version
Available under License Creative Commons Attribution.

Download (272kB) | Preview

Abstract

We present the Arabic dialect identification system that we used for the country-level subtask of the NADI challenge. Our model consists of three components: BiLSTM-CNN, character-level TF-IDF, and topic modeling features. We represent each tweet using these features and feed them into a deep neural network. We then add an effective heuristic that improves the overall performance. We achieved an F1-Macro score of 20.77% and an accuracy of 34.32% on the test set. The model was also evaluated on the Arabic Online Commentary dataset, achieving results better than the state-of-the-art.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Published proceedings: Proceedings of the Fifth Arabic Natural Language Processing Workshop
Divisions: Faculty of Humanities > Law, School of
Depositing User: Jim Jamieson
Date Deposited: 09 Jun 2022 11:37
Last Modified: 09 Jun 2022 11:37
URI: http://repository.essex.ac.uk/id/eprint/32976

Actions (login required)

View Item View Item