Suwanwong, Anusa (2026) Advancing feature selection techniques for supervised machine learning within functional genomic experiments by means of overlapping analysis. Doctoral thesis, University of Essex. DOI https://doi.org/10.5526/ERR-00043017
Suwanwong, Anusa (2026) Advancing feature selection techniques for supervised machine learning within functional genomic experiments by means of overlapping analysis. Doctoral thesis, University of Essex. DOI https://doi.org/10.5526/ERR-00043017
Suwanwong, Anusa (2026) Advancing feature selection techniques for supervised machine learning within functional genomic experiments by means of overlapping analysis. Doctoral thesis, University of Essex. DOI https://doi.org/10.5526/ERR-00043017
Abstract
Microarray technology enables the simultaneous measurement of tens of thousands of genes (features) with a small number of tissue samples (observations). This common characteristic of high dimensionality has a great impact on the classification tasks, since most genes are noisy, redundant or non-relevant. A statistical learning approach aims at understanding and modeling complex datasets. Given a set of training data, its primary goal is to create a model that captures the relationship between a set of input features and the corresponding response in a predictive manner. Therefore, applying classification methods to microarray data is a crucial task which helps reduce dimensionality as well as categorise biological samples into distinct classes, such as different stages of a disease. The prediction accuracy and interpretability of a model can be improved when the learning process is conducted using only the selected informative features. Two novel statistical methods are proposed; 3-class Proportional Overlapping Scores (3cPOS) and multiple Proportional Overlapping Scores (mPOS). Both methods exploit overlapping analysis to measure the level of overlap between different expression intervals, resulting in 3cPOS and mPOS scores. These scores help identify the informative genes (features) of three and multiple classes. Smaller 3cPOS and mPOS scores indicate a higher discriminative capability of gene i. The 3cPOS and mPOS methods are validated on several publicly available gene expression datasets using widely used classifiers to examine the impact of feature selection on model performance through classification accuracy. Selection stability is also used to address the captured biological knowledge in the obtained results. The experimental results reveal that the 3cPOS performs better than comparative feature selection methods. Additionally, the experimental results demonstrate that the mPOS either outperforms or demonstrates comparable performance. Both methods consistently deliver reliable performance, even with limited sample sizes, underscoring their versatility and effectiveness in gene selection.
| Item Type: | Thesis (Doctoral) |
|---|---|
| Subjects: | Q Science > QA Mathematics Q Science > QA Mathematics > QA76 Computer software |
| Divisions: | Faculty of Science and Health > Mathematics, Statistics and Actuarial Science, School of |
| Depositing User: | Anusa Suwanwong |
| Date Deposited: | 14 Apr 2026 11:38 |
| Last Modified: | 14 Apr 2026 11:38 |
| URI: | http://repository.essex.ac.uk/id/eprint/43017 |
Available files
Filename: Anusa Final PhD_1.pdf