Amorim, Renato and Makarenkov, Vladimir (2023) On k-means iterations and Gaussian clusters. Neurocomputing, 553. p. 126547. DOI https://doi.org/10.1016/j.neucom.2023.126547
Amorim, Renato and Makarenkov, Vladimir (2023) On k-means iterations and Gaussian clusters. Neurocomputing, 553. p. 126547. DOI https://doi.org/10.1016/j.neucom.2023.126547
Amorim, Renato and Makarenkov, Vladimir (2023) On k-means iterations and Gaussian clusters. Neurocomputing, 553. p. 126547. DOI https://doi.org/10.1016/j.neucom.2023.126547
Abstract
Nowadays, k-means remains arguably the most popular clustering algorithm (Jain, 2010; Vouros et al., 2021). Two of its main properties are simplicity and speed in practice. Here, our main claim is that the average number of iterations k-means takes to converge (τ¯) is in fact very informative. We find this to be particularly interesting because τ¯ is always known when applying k-means but has never been, to our knowledge, used in the data analysis process. By experimenting with Gaussian clusters, we show that τ¯ is related to the structure of a data set under study. Data sets containing Gaussian clusters have a much lower τ¯ than those containing uniformly random data. In fact, we go considerably further and demonstrate a pattern of inverse correlation between τ¯ and the clustering quality. We illustrate the importance of our findings through two practical applications. First, we describe the cases in which τ¯ can be effectively used to identify irrelevant features present in a given data set or be used to improve the results of existing feature selection algorithms. Second, we show that there is a strong relationship between τ¯ and the number of clusters in a data set, and that this relationship can be used to find the true number of clusters it contains.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Clustering; Feature selection; k-means |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 24 Aug 2023 09:47 |
Last Modified: | 30 Oct 2024 21:25 |
URI: | http://repository.essex.ac.uk/id/eprint/36006 |
Available files
Filename: 1-s2.0-S0925231223006707-main.pdf
Licence: Creative Commons: Attribution 4.0