Mhamed, Mustafa and Sutcliffe, Richard and Feng, Jun (2025) Benchmark Arabic news posts and analyzes Arabic sentiment through RMuBERT and SSL with AMCFFL technique. Egyptian Informatics Journal, 29. p. 100601. DOI https://doi.org/10.1016/j.eij.2024.100601
Mhamed, Mustafa and Sutcliffe, Richard and Feng, Jun (2025) Benchmark Arabic news posts and analyzes Arabic sentiment through RMuBERT and SSL with AMCFFL technique. Egyptian Informatics Journal, 29. p. 100601. DOI https://doi.org/10.1016/j.eij.2024.100601
Mhamed, Mustafa and Sutcliffe, Richard and Feng, Jun (2025) Benchmark Arabic news posts and analyzes Arabic sentiment through RMuBERT and SSL with AMCFFL technique. Egyptian Informatics Journal, 29. p. 100601. DOI https://doi.org/10.1016/j.eij.2024.100601
Abstract
Sentiment analysis aims to extract emotions from textual data; sentiment analysis and text recognition are two of the most common tasks associated with natural language processing. Emergent technologies have been developed and employed in various fields, including marketing, health care, and policy making. However, with the growth of social media platforms and the flow of data, especially in the Arabic language, substantial difficulties have emerged that call for the creation of new frameworks to address problems, such as the lack of datasets related to news platforms, the complicated formation of the Arabic language, and complications with classifying, and system challenges, whether in machine learning, deep learning, or online analysis tools. This paper provides a new framework that helps address ASA challenges and work on various tasks based on the state-of-the-art ASA. First, it presents a new collection named (ANP5) from Arabic news posts from several Arabic platforms, then uses SSL with AMCFFL technique to analyze the Arabic sentiment and generate a second dataset (ANPS2). Next, applied ML classifiers, RF and SVM, do the best among the other classifiers, with an accuracy of 82.00%; however, the measurement distributions for each class are different (Experiment 1). Following that, DL models, BIGRU, CNN-LSTM, LSTM, and CNN, had accuracies of 88.10%, 89.30%, 89.85%, and 90.10% (Experiment 2). Experiments 1 and 2 represent the initial benchmark classification as the first baseline. Afterward, a new RMuBERT Model was developed and compared with four transformers on the two datasets: ANPS2 accuracy (90.87%) and ANP5 (90.33%). RMuBERT performed better than the baselines (Experiment 3). Further testing of RMuBERT on various Arabic corpora with different classes, lengths, and sizes: ArSarcasm (3C), STD (2C), AJGT (2C), and AAQ (2C), revealed accuracies of 77.76%, 91.79%, 94.07%, and 93.48%, respectively. Still, RMuBERT performed better than the baselines (Experiment 4). Finally, on the largest Arabic sentiment corpora with six million Arabic tweets, the performance is up to (91.12%); RMuBERT works efficiently with less training time (Experiment 5).
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Natural language processing; Arabic sentiment analysis; SSL; RMuBERT; ANP5; ANPS2 |
Subjects: | Z Bibliography. Library Science. Information Resources > ZZ OA Fund (articles) |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 09 May 2025 15:12 |
Last Modified: | 09 May 2025 15:12 |
URI: | http://repository.essex.ac.uk/id/eprint/40855 |
Available files
Filename: 1-s2.0-S1110866524001646-main.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0