Bari, Nimra and Saleem, Tahir and Shah, Munam and Algarni, Abdulmohsen and Patel, Asma and Ullah, Insaf (2025) A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning. Computer Modeling in Engineering & Sciences, 145 (1). pp. 1167-1187. DOI https://doi.org/10.32604/cmes.2025.070311
Bari, Nimra and Saleem, Tahir and Shah, Munam and Algarni, Abdulmohsen and Patel, Asma and Ullah, Insaf (2025) A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning. Computer Modeling in Engineering & Sciences, 145 (1). pp. 1167-1187. DOI https://doi.org/10.32604/cmes.2025.070311
Bari, Nimra and Saleem, Tahir and Shah, Munam and Algarni, Abdulmohsen and Patel, Asma and Ullah, Insaf (2025) A Filter-Based Feature Selection Framework to Detect Phishing URLs Using Stacking Ensemble Machine Learning. Computer Modeling in Engineering & Sciences, 145 (1). pp. 1167-1187. DOI https://doi.org/10.32604/cmes.2025.070311
Abstract
Today, phishing is an online attack designed to obtain sensitive information such as credit card and bank account numbers, passwords, and usernames. We can find several anti-phishing solutions, such as heuristic detection, virtual similarity detection, black and white lists, and machine learning (ML). However, phishing attempts remain a problem, and establishing an effective anti-phishing strategy is a work in progress. Furthermore, while most anti-phishing solutions achieve the highest levels of accuracy on a given dataset, their methods suffer from an increased number of false positives. These methods are ineffective against zero-hour attacks. Phishing sites with a high False Positive Rate (FPR) are considered genuine because they can cause people to lose a lot of money by visiting them. Feature selection is critical when developing phishing detection strategies. Good feature selection helps improve accuracy; however, duplicate features can also increase noise in the dataset and reduce the accuracy of the algorithm. Therefore, a combination of filter-based feature selection methods is proposed to detect phishing attacks, including constant feature removal, duplicate feature removal, quasi-feature removal, correlated feature removal, mutual information extraction, and Analysis of Variance (ANOVA) testing. The technique has been tested with different Machine Learning classifiers: Random Forest, Artificial Neural Network (ANN), Ada-Boost, Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Trees, Gradient Boosting Classifiers, Support Vector Machine (SVM), and two types of ensemble models, stacking and majority voting to gain A low false positive rate is achieved. Stacked ensemble classifiers (gradient boosting, random forest, support vector machine) achieve 1.31% FPR and 98.17% accuracy on Dataset 1, 2.81% FPR and Dataset 3 shows 2.81% FPR and 97.61% accuracy, while Dataset 2 shows 3.47% FPR and 96.47% accuracy.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Phishing detection; feature selection; stacking ensemble; machine learning; phishing URL |
| Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
| SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
| Depositing User: | Unnamed user with email elements@essex.ac.uk |
| Date Deposited: | 17 Dec 2025 14:44 |
| Last Modified: | 17 Dec 2025 14:44 |
| URI: | http://repository.essex.ac.uk/id/eprint/42397 |
Available files
Filename: TSP_CMC_67641.pdf
Licence: Creative Commons: Attribution 4.0