Aloraini, Abdulrahman and Poesio, Massimo and Alhelbawy, Ayman (2020) The QMUL/HRBDT contribution to the NADI Arabic Dialect Identification Shared Task. In: Fifth Arabic Natural Language Processing Workshop (WANLP 2020), Barcelona, Spain.
Aloraini, Abdulrahman and Poesio, Massimo and Alhelbawy, Ayman (2020) The QMUL/HRBDT contribution to the NADI Arabic Dialect Identification Shared Task. In: Fifth Arabic Natural Language Processing Workshop (WANLP 2020), Barcelona, Spain.
Aloraini, Abdulrahman and Poesio, Massimo and Alhelbawy, Ayman (2020) The QMUL/HRBDT contribution to the NADI Arabic Dialect Identification Shared Task. In: Fifth Arabic Natural Language Processing Workshop (WANLP 2020), Barcelona, Spain.
Abstract
We present the Arabic dialect identification system that we used for the country-level subtask of the NADI challenge. Our model consists of three components: BiLSTM-CNN, character-level TF-IDF, and topic modeling features. We represent each tweet using these features and feed them into a deep neural network. We then add an effective heuristic that improves the overall performance. We achieved an F1-Macro score of 20.77% and an accuracy of 34.32% on the test set. The model was also evaluated on the Arabic Online Commentary dataset, achieving results better than the state-of-the-art.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Published proceedings: Proceedings of the Fifth Arabic Natural Language Processing Workshop |
Divisions: | Faculty of Humanities > Essex Law School |
Depositing User: | Jim Jamieson |
Date Deposited: | 09 Jun 2022 11:37 |
Last Modified: | 09 Jun 2022 11:37 |
URI: | http://repository.essex.ac.uk/id/eprint/32976 |
Available files
Filename: 2020.wanlp-1.31.pdf
Licence: Creative Commons: Attribution 3.0