Amorim, RC and Hennig, C (2015) Recovering the number of clusters in data sets with noise features using feature rescaling factors. Information Sciences, 324. pp. 126-145. DOI https://doi.org/10.1016/j.ins.2015.06.039
Amorim, RC and Hennig, C (2015) Recovering the number of clusters in data sets with noise features using feature rescaling factors. Information Sciences, 324. pp. 126-145. DOI https://doi.org/10.1016/j.ins.2015.06.039
Amorim, RC and Hennig, C (2015) Recovering the number of clusters in data sets with noise features using feature rescaling factors. Information Sciences, 324. pp. 126-145. DOI https://doi.org/10.1016/j.ins.2015.06.039
Abstract
In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the pth power of the Minkowski distance), Dunn’s, Calinski–Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Feature re-scaling; Clustering; K-Means; Cluster validity index; Feature weighting |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 18 Sep 2017 13:46 |
Last Modified: | 30 Oct 2024 19:34 |
URI: | http://repository.essex.ac.uk/id/eprint/20364 |
Available files
Filename: 1602.06989v1.pdf
Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 3.0