Research Repository

The Influence of Text Pre-processing on Plagiarism Detection

Ceska, Z and Fox, C (2011) The Influence of Text Pre-processing on Plagiarism Detection. In: UNSPECIFIED, ? - ?.


Download (561kB) | Preview


This paper explores the influence of text preprocessing techniques on plagiarism detection. We examine stop-word removal, lemmatization,number replacement, synonymy recognition, and word generalization. We also look into the influence of punctuation and word-order within N-grams. All these techniques are evaluated according to their impact on F1-measure and speed of execution. Our experiments were performed on a Czech corpus of plagiarized documents about politics. At the end of this paper, we propose what we consider to be the best combination of text pre-processing techniques.

Item Type: Conference or Workshop Item (UNSPECIFIED)
Additional Information: Published proceedings: _not provided_ - Notes:
Uncontrolled Keywords: Plagiarism; Copy Detection; Natural Language Processing; Stop-words; Lemmatization; Synonymy; WordNet; Thesaurus
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health
Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor: Elements
Depositing User: Elements
Date Deposited: 18 Oct 2012 22:44
Last Modified: 15 Jan 2022 01:04

Actions (login required)

View Item View Item