Research Repository

The Influence of Text Pre-processing on Plagiarism Detection

Ceska, Z and Fox, C (2011) The Influence of Text Pre-processing on Plagiarism Detection. In: UNSPECIFIED, ? - ?.


Download (561kB) | Preview


This paper explores the influence of text preprocessing techniques on plagiarism detection. We examine stop-word removal, lemmatization,number replacement, synonymy recognition, and word generalization. We also look into the influence of punctuation and word-order within N-grams. All these techniques are evaluated according to their impact on F1-measure and speed of execution. Our experiments were performed on a Czech corpus of plagiarized documents about politics. At the end of this paper, we propose what we consider to be the best combination of text pre-processing techniques.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Additional Information: Published proceedings: _not provided_
Uncontrolled Keywords: Plagiarism; Copy Detection; Natural Language Processing; Stop-words; Lemmatization; Synonymy; WordNet; Thesaurus
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Users 161 not found.
Date Deposited: 18 Oct 2012 22:44
Last Modified: 04 Sep 2019 16:15

Actions (login required)

View Item View Item