Research Repository

The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas

Antai, R and Fox, C and Kruschwitz, U (2011) 'The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas.' In: UNSPECIFIED, (ed.) Proceedings of the Fifth Language Technology Conference. Springer.

[img]
Preview
Text
tcla-3.pdf - Submitted Version

Download (2MB) | Preview

Abstract

Keyword matching information retrieval systems areplagued with problems of noise in the document collection, arising from synonymy and polysemy. This noise tends to hide the latent structure of the documents, hence reduing the accuracy of the information retrieval systems, as well asmaking it difficult for clustering algorithms to pick up on shared concepts, and effectively cluster similar documents. Latent Semantic Analysis (LSA) through its use of Singular Value Decomposition reduces the dimension of the document space, mapping it onto a smaller concept space devoid of this noice and making it easier to group similar documents together. This work is an exploratory report of the use of LSA to cluster a small dataset of documents according to their topic areas to see how LSA would fare in comparison to clustering with a clustering package, without LSA

Item Type: Book Section
Uncontrolled Keywords: Latent Semantic Indexing; Singular Value Decomposition; Information Retrieval; Document Clustering
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Users 161 not found.
Date Deposited: 03 Jul 2013 08:55
Last Modified: 17 Aug 2017 18:07
URI: http://repository.essex.ac.uk/id/eprint/4231

Actions (login required)

View Item View Item