Synthetic Data Generation and Impact Analysis of Machine Learning Models for Enhanced Credit Card Fraud Detection

Khaled, Ahmed Abdullah and Hasan, Md Mahmudul and Islam, Shareeful and Papastergiou, Spyridon and Mouratidis, Haralambos (2024) Synthetic Data Generation and Impact Analysis of Machine Learning Models for Enhanced Credit Card Fraud Detection. In: AIAI 2024 IFIP International Conference on Artificial Intelligence Applications and Innovations, 2024-06-27 - 2024-06-30, Corfu, Greece.

Abstract

The financial industry is currently experiencing a substantial shift in its operating landscape because of the swift integration of technology. This transformation brings with it potential risks and challenges. Heightened occurrence of online fraud is one the key concerns for this sector, which has been exacerbated by the growing prevalence of online payment methods on e-commerce platforms and other websites. The identification of credit card fraud is a challenging task due to nature of imbalanced transactional data to detect and predict any fraudulent activities. In this context, this paper provides a unique approach to create synthetic dataset to tackle imbalanced issue for credit card fraud detection. The approach adopts Synthetic Minority Over-sampling Technique (SMOTE) technique for balancing dataset. An experiment is performed using several ML models including SVM (Support Vector Machines), KNN (K-Nearest Neighbours), and Random Forest to demonstrate the feasibility of using synthetic data. In this study, we have combined resampling techniques like SMOTE for oversampling the minority class with ensemble methods and appropriate evaluation metrics like the F1-score to improve the imbalanced data. The result from the experiment compared with widely used public datasets to evaluate the model performance. The analysis reveals an imbalance in the real ULB (Université Libre de Bruxelles) dataset, with the positive class (frauds) comprising a mere 0.172% of all transactions. The findings clearly show that the Random Forest model performs better than other modes with outstanding precision, recall, accuracy, and F1 score values to detect fraudulent transactions and reduce false positives.

Item Metadata

Item Type:	Conference or Workshop Item (Paper)
Uncontrolled Keywords:	Credit card fraud; Online Transactions; Synthetic Dataset; Imbalanced Dataset; Feature Transformation; Random Forest
Divisions:	Faculty of Science and Health Faculty of Science and Health > Computer Science and Electronic Engineering, School of
SWORD Depositor:	Unnamed user with email elements@essex.ac.uk
Depositing User:	Unnamed user with email elements@essex.ac.uk
Date Deposited:	02 Oct 2024 13:04
Last Modified:	16 Aug 2025 06:13
URI:	http://repository.essex.ac.uk/id/eprint/39175

Available files

Accepted Version

Filename: Paper 25_ Camera Ready.pdf

Download

Synthetic Data Generation and Impact Analysis of Machine Learning Models for Enhanced Credit Card Fraud Detection

Abstract

Item Metadata

Share and export

Available files

Accepted Version

Statistics

Altmetrics

Downloads