Unsupervised feature selection for large data sets

Cordeiro de Amorim, Renato (2019) Unsupervised feature selection for large data sets. Pattern Recognition Letters, 128. pp. 183-189. DOI https://doi.org/10.1016/j.patrec.2019.08.017

Abstract

The last decade saw a considerable increase in the availability of data. Unfortunately, this increase was overshadowed by various technical difficulties that arise when analysing large data sets. These include long processing times, large requirements for data storage, and other technical issues related to the analysis of high-dimensional data sets. By consequence, reducing the cardinality of data sets (with minimum information loss) has become of interest to virtually any data scientist. Many feature selection algorithms have been introduced in the literature, however, there are two main issues with these. First, the vast majority of such algorithms require labelled samples to learn from. One should note it is often too expensive to label a meaningful amount of data, particularly when dealing with large data sets. Second, these algorithms were not designed to deal with the volume of data we have nowadays. This paper introduces a novel unsupervised feature selection algorithm designed specifically to deal with large data sets. Our experiments demonstrate the superiority of our method.

Item Metadata

Item Type:	Article
Uncontrolled Keywords:	Unsupervised feature selection; Clustering; Big data
Divisions:	Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor:	Unnamed user with email elements@essex.ac.uk
Depositing User:	Unnamed user with email elements@essex.ac.uk
Date Deposited:	30 Aug 2019 09:15
Last Modified:	30 Oct 2024 16:16
URI:	http://repository.essex.ac.uk/id/eprint/25231

Available files

Accepted Version

Filename: 1-s2.0-S0167865518304963-main.pdf

Licence: Creative Commons: Attribution-Noncommercial-No Derivative Works 3.0

Download

Unsupervised feature selection for large data sets

Abstract

Item Metadata

Share and export

Available files

Accepted Version

Statistics

Altmetrics

Downloads