Khan, Z and Gul, A and Perperoglou, A and Miftahuddin, M and Mahmoud, O and Adler, W and Lausen, B (2016) An Ensemble of Optimal Trees for Classification and Regression (OTE). UNSPECIFIED. Elsevier.
Khan, Z and Gul, A and Perperoglou, A and Miftahuddin, M and Mahmoud, O and Adler, W and Lausen, B (2016) An Ensemble of Optimal Trees for Classification and Regression (OTE). UNSPECIFIED. Elsevier.
Khan, Z and Gul, A and Perperoglou, A and Miftahuddin, M and Mahmoud, O and Adler, W and Lausen, B (2016) An Ensemble of Optimal Trees for Classification and Regression (OTE). UNSPECIFIED. Elsevier.
Abstract
Predictive performance of a random forest ensemble is highly associated with the strength of individual trees and their diversity. Ensemble of a small number of accurate and diverse trees, if prediction accuracy is not compromised, will also reduce computational burden. We investigate the idea of integrating trees that are accurate and diverse. For this purpose, we utilize out-of-bag observation as validation sample from the training bootstrap samples to choose the best trees based on their individual performance and then assess these trees for diversity using Brier score. Starting from the first best tree, a tree is selected for the final ensemble if its addition to the forest reduces error of the trees that have already been added. A total of 35 bench mark problems on classification and regression are used to assess the performance of the proposed method and compare it with kNN, tree, random forest, node harvest and support vector machine. We compute unexplained variances and classification error rates for all the methods on the corresponding data sets. Our experiments reveal that the size of the ensemble is reduced significantly and better results are obtained in most of the cases. For further verification, a simulation study is also given where four tree style scenarios are considered to generate data sets with several structures.
Item Type: | Monograph (UNSPECIFIED) |
---|---|
Uncontrolled Keywords: | classification and regression trees; random forest; ensemble methods; accuracy and diversity |
Subjects: | Q Science > QA Mathematics |
Divisions: | Faculty of Science and Health Faculty of Science and Health > Mathematics, Statistics and Actuarial Science, School of |
SWORD Depositor: | Unnamed user with email elements@essex.ac.uk |
Depositing User: | Unnamed user with email elements@essex.ac.uk |
Date Deposited: | 19 Sep 2016 09:17 |
Last Modified: | 16 May 2024 18:23 |
URI: | http://repository.essex.ac.uk/id/eprint/17595 |
Available files
Filename: OTE-Khan-et-al-preprint-BLG-DRC-17Sept16.pdf