Quteineh, Husam and Samothrakis, Spyridon and Sutcliffe, Richard (2022) Enhancing Task-Specific Distillation in Small Data Regimes through Language Generation. In: Proceedings of the 29th International Conference on Computational Linguistics, 2022-10-12 - 2022-10-17, Gyeongju, Republic of Korea.
Quteineh, Husam and Samothrakis, Spyridon and Sutcliffe, Richard (2022) Enhancing Task-Specific Distillation in Small Data Regimes through Language Generation. In: Proceedings of the 29th International Conference on Computational Linguistics, 2022-10-12 - 2022-10-17, Gyeongju, Republic of Korea.
Quteineh, Husam and Samothrakis, Spyridon and Sutcliffe, Richard (2022) Enhancing Task-Specific Distillation in Small Data Regimes through Language Generation. In: Proceedings of the 29th International Conference on Computational Linguistics, 2022-10-12 - 2022-10-17, Gyeongju, Republic of Korea.
Abstract
Large-scale pretrained language models have led to significant improvements in Natural Language Processing. Unfortunately, they come at the cost of high computational and storage requirements that complicate their deployment on low-resource devices. This issue can be addressed by distilling knowledge from larger models to smaller ones through pseudo-labels on task-specific datasets. However, this can be difficult for tasks with very limited data. To overcome this challenge, we present a novel approach where knowledge can be distilled from a teacher model to a student model through the generation of synthetic data. For this to be done, we first fine-tune the teacher and student models, as well as a Natural Language Generation (NLG) model, on the target task dataset. We then let both student and teacher work together to condition the NLG model to generate examples that can enhance the performance of the student. We tested our approach on two data generation methods: a) Targeted generation using the Monte Carlo Tree Search (MCTS) algorithm, and b) A Non-Targeted Text Generation (NTTG) method. We evaluate the effectiveness of our approaches against a baseline that uses the BERT model for data augmentation through random word replacement. By testing this approach on the SST-2, MRPC, YELP-2, DBpedia, and TREC-6 datasets, we consistently witnessed considerable improvements over the word-replacement baseline.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Additional Information: | Published proceedings: _not provided_ |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 19 Jan 2023 13:56 |
Last Modified: | 24 Nov 2023 21:20 |
URI: | http://repository.essex.ac.uk/id/eprint/34115 |
Available files
Filename: 2022.coling-1.520.pdf
Licence: Creative Commons: Attribution 3.0