Ceska, Z and Fox, C (2011) The Influence of Text Pre-processing on Plagiarism Detection. In: UNSPECIFIED, ? - ?.
Ceska, Z and Fox, C (2011) The Influence of Text Pre-processing on Plagiarism Detection. In: UNSPECIFIED, ? - ?.
Ceska, Z and Fox, C (2011) The Influence of Text Pre-processing on Plagiarism Detection. In: UNSPECIFIED, ? - ?.
Abstract
This paper explores the influence of text preprocessing techniques on plagiarism detection. We examine stop-word removal, lemmatization,number replacement, synonymy recognition, and word generalization. We also look into the influence of punctuation and word-order within N-grams. All these techniques are evaluated according to their impact on F1-measure and speed of execution. Our experiments were performed on a Czech corpus of plagiarized documents about politics. At the end of this paper, we propose what we consider to be the best combination of text pre-processing techniques.
Item Type: | Conference or Workshop Item (UNSPECIFIED) |
---|---|
Additional Information: | Published proceedings: _not provided_ - Notes: |
Uncontrolled Keywords: | Plagiarism; Copy Detection; Natural Language Processing; Stop-words; Lemmatization; Synonymy; WordNet; Thesaurus |
Subjects: | P Language and Literature > P Philology. Linguistics Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 18 Oct 2012 22:44 |
Last Modified: | 16 May 2024 17:51 |
URI: | http://repository.essex.ac.uk/id/eprint/4019 |
Available files
Filename: R09-1011.pdf