Pelicon, Andraž and Karan, Mladen and Shekhar, Ravi and Purver, Matthew and Pollak, Senja (2024) Denoising Labeled Data for Comment Moderation Using Active Learning. In: LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024-05-20 - 2024-02-25, Torino, Italy.
Pelicon, Andraž and Karan, Mladen and Shekhar, Ravi and Purver, Matthew and Pollak, Senja (2024) Denoising Labeled Data for Comment Moderation Using Active Learning. In: LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024-05-20 - 2024-02-25, Torino, Italy.
Pelicon, Andraž and Karan, Mladen and Shekhar, Ravi and Purver, Matthew and Pollak, Senja (2024) Denoising Labeled Data for Comment Moderation Using Active Learning. In: LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, 2024-05-20 - 2024-02-25, Torino, Italy.
Abstract
Noisily labeled textual data is ample on internet platforms that allow user-created content. Training models, such as offensive language detection models for comment moderation, on such data may prove difficult as the noise in the labels prevents the model to converge. In this work, we propose to use active learning methods for the purposes of denoising training data for model training. The goal is to sample examples the most informative examples with noisy labels with active learning and send them to the oracle for reannotation thus reducing the overall cost of reannotation. In this setting we tested three existing active learning methods, namely DBAL, Variance of Gradients (VoG) and BADGE. The proposed approach to data denoising is tested on the problem of offensive language detection. We observe that active learning can be effectively used for the purposes of data denoising, however care should be taken when choosing the algorithm for this purpose.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Published proceedings: _not provided_ |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 02 Aug 2024 11:49 |
Last Modified: | 02 Aug 2024 11:49 |
URI: | http://repository.essex.ac.uk/id/eprint/38104 |
Available files
Filename: 2024.lrec-main.413.pdf
Licence: Creative Commons: Attribution 4.0