Antai, R and Fox, C and Kruschwitz, U (2011) The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas. In: UNSPECIFIED, ? - ?.
Antai, R and Fox, C and Kruschwitz, U (2011) The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas. In: UNSPECIFIED, ? - ?.
Antai, R and Fox, C and Kruschwitz, U (2011) The Use of Latent Semantic Indexing to Cluster Documents into Their Subject Areas. In: UNSPECIFIED, ? - ?.
Abstract
Keyword matching information retrieval systems areplagued with problems of noise in the document collection, arising from synonymy and polysemy. This noise tends to hide the latent structure of the documents, hence reduing the accuracy of the information retrieval systems, as well asmaking it difficult for clustering algorithms to pick up on shared concepts, and effectively cluster similar documents. Latent Semantic Analysis (LSA) through its use of Singular Value Decomposition reduces the dimension of the document space, mapping it onto a smaller concept space devoid of this noice and making it easier to group similar documents together. This work is an exploratory report of the use of LSA to cluster a small dataset of documents according to their topic areas to see how LSA would fare in comparison to clustering with a clustering package, without LSA
Item Type: | Conference or Workshop Item (UNSPECIFIED) |
---|---|
Additional Information: | Published proceedings: _not provided_ - Notes: |
Uncontrolled Keywords: | Latent Semantic Indexing; Singular Value Decomposition; Information Retrieval; Document Clustering |
Subjects: | P Language and Literature > P Philology. Linguistics Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 03 Jul 2013 08:55 |
Last Modified: | 16 May 2024 17:51 |
URI: | http://repository.essex.ac.uk/id/eprint/4231 |
Available files
Filename: tcla-3.pdf