Research Repository

The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas

Antai, R and Fox, C and Kruschwitz, U (2011) The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas. In: UNSPECIFIED, ? - ?.

tcla-3.pdf - Submitted Version

Download (2MB) | Preview


Keyword matching information retrieval systems areplagued with problems of noise in the document collection, arising from synonymy and polysemy. This noise tends to hide the latent structure of the documents, hence reduing the accuracy of the information retrieval systems, as well asmaking it difficult for clustering algorithms to pick up on shared concepts, and effectively cluster similar documents. Latent Semantic Analysis (LSA) through its use of Singular Value Decomposition reduces the dimension of the document space, mapping it onto a smaller concept space devoid of this noice and making it easier to group similar documents together. This work is an exploratory report of the use of LSA to cluster a small dataset of documents according to their topic areas to see how LSA would fare in comparison to clustering with a clustering package, without LSA

Item Type: Conference or Workshop Item (UNSPECIFIED)
Additional Information: Published proceedings: _not provided_ - Notes:
Uncontrolled Keywords: Latent Semantic Indexing; Singular Value Decomposition; Information Retrieval; Document Clustering
Subjects: P Language and Literature > P Philology. Linguistics
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health
Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor: Elements
Depositing User: Elements
Date Deposited: 03 Jul 2013 08:55
Last Modified: 23 Sep 2022 19:05

Actions (login required)

View Item View Item