Identifying meaningful clusters in malware data

Amorim, Renato and Lopez Ruiz, Carlos D (2021) Identifying meaningful clusters in malware data. Expert Systems with Applications, 177. DOI https://doi.org/10.1016/j.eswa.2021.114971

Abstract

Finding meaningful clusters in drive-by-download malware data is a particularly difficult task. Malware data tends to contain overlapping clusters with wide variations of cardinality. This happens because there can be considerable similarity between malware samples (some are even said to belong to the same family), and these tend to appear in bursts. Clustering algorithms are usually applied to normalised data sets. However, the process of normalisation aims at setting features with different range values to have a similar contribution to the clustering. It does not favour more meaningful features over those that are less meaningful, an effect one should perhaps expect of the data pre-processing stage. In this paper we introduce a method to deal precisely with the problem above. This is an iterative data pre-processing method capable of aiding to increase the separation between clusters. It does so by calculating the within-cluster degree of relevance of each feature, and then it uses these as a data rescaling factor. By repeating this until convergence our malware data was separated in clear clusters, leading to a higher average silhouette width.

Item Metadata

Item Type:	Article
Uncontrolled Keywords:	feature rescaling, drive-by-download malware, clustering
Divisions:	Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor:	Unnamed user with email elements@essex.ac.uk
Depositing User:	Unnamed user with email elements@essex.ac.uk
Date Deposited:	28 Feb 2020 14:24
Last Modified:	30 Oct 2024 16:28
URI:	http://repository.essex.ac.uk/id/eprint/27045

Available files

Accepted Version

Filename: FWInMalwareAnalysis.pdf

Licence: Creative Commons: Attribution-Noncommercial 3.0

Download

Identifying meaningful clusters in malware data

Abstract

Item Metadata

Share and export

Available files

Accepted Version

Statistics

Altmetrics

Downloads