Research Repository

A new term weighting scheme based on class specific document frequency for document representation and classification

Plansangket, S and Gan, JQ (2015) A new term weighting scheme based on class specific document frequency for document representation and classification. In: UNSPECIFIED, ? - ?.

Full text not available from this repository.

Abstract

Document classification is usually more challenging than numerical data classification, because it is much more difficult to effectively represent documents than numerical data for classification purposes. Vector space model (VSM) has been widely used for document representation for classification, in which a document is represented by a vector of feature values based on a bag of words. This paper proposes a new feature for document representation under the VSM framework, class specific document frequency (CSDF), which leads to a novel term weighting scheme based on term frequency (TF), term presence (TP), and the newly proposed feature. The experimental results show that the proposed features, CSDF and TF-CSDF, effectively improve the performance of document classification in comparison with other widely used VSM document representations.

Item Type: Conference or Workshop Item (Paper)
Additional Information: Published proceedings: 2015 7th Computer Science and Electronic Engineering Conference, CEEC 2015 - Conference Proceedings
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Jim Jamieson
Date Deposited: 24 Nov 2015 13:43
Last Modified: 30 Mar 2021 23:15
URI: http://repository.essex.ac.uk/id/eprint/15515

Actions (login required)

View Item View Item