Research Repository

Unsupervised feature selection for large data sets

Cordeiro de Amorim, Renato (2019) 'Unsupervised feature selection for large data sets.' Pattern Recognition Letters. ISSN 0167-8655

1-s2.0-S0167865518304963-main.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (255kB) | Preview


The last decade saw a considerable increase in the availability of data. Unfortunately, this increase was overshadowed by various technical difficulties that arise when analysing large data sets. These include long processing times, large requirements for data storage, and other technical issues related to the analysis of high-dimensional data sets. By consequence, reducing the cardinality of data sets (with minimum information loss) has become of interest to virtually any data scientist. Many feature selection algorithms have been introduced in the literature, however, there are two main issues with these. First, the vast majority of such algorithms require labelled samples to learn from. One should note it is often too expensive to label a meaningful amount of data, particularly when dealing with large data sets. Second, these algorithms were not designed to deal with the volume of data we have nowadays. This paper introduces a novel unsupervised feature selection algorithm designed specifically to deal with large data sets. Our experiments demonstrate the superiority of our method.

Item Type: Article
Uncontrolled Keywords: Unsupervised feature selection, Clustering, Big data
Divisions: Faculty of Science and Health > Computer Science and Electronic Engineering, School of
Depositing User: Elements
Date Deposited: 30 Aug 2019 09:15
Last Modified: 27 Aug 2020 01:00

Actions (login required)

View Item View Item