Wang, Yucheng (2023) DNA Methylation Analysis and Age Prediction. Doctoral thesis, University of Essex.
Wang, Yucheng (2023) DNA Methylation Analysis and Age Prediction. Doctoral thesis, University of Essex.
Wang, Yucheng (2023) DNA Methylation Analysis and Age Prediction. Doctoral thesis, University of Essex.
Abstract
DNA methylation microarrays have been the most cost-effective choice for large cohort studies aimed to investigate associations between methylome changes and diseases or environmental exposures. The findings of many CpG sites across the genome whose methylation changes are highly correlated with age have led to the construction of various interesting epigenetic age estimation models, also known as epigenetic clocks. However, there is still largely unclear regarding the mechanisms that drive age associate methylation changes. In this thesis, the first two chapters describe two novel bioinformatic tools for analyzing DNA methylation microarray data respectively. After that, the existing claim that cerebellums age slowly is re-examined. Many samples on the Gene Expression Omnibus frequently lack a sex annotation or are incorrectly labelled. Considering the influence that sex imposes on DNA methylation patterns, it is necessary to ensure that methods for filtering poor samples and checking sex assignments are accurate and widely applicable. In the first chapter, a novel method to predict sample sex using only DNA methylation beta values is presented, which can be readily applied to almost all DNA methylation datasets of different formats. I firstly identified 4,345 CpG sites located on both 450K and EPIC arrays which are differentially methylated between females and males. A novel sex classifier was then constructed by combining the two first principal components of the DNA methylation data of sex-associated probes mapped on sex chromosomes. The proposed method was constructed using whole blood samples and exhibits good performance across a wide range of tissues. It is also demonstrated that this classifier can be used to identify samples with sex chromosome aneuploidy, this function is validated by five Turner syndrome cases and one Klinefelter syndrome case. Data normalization is an essential step to reduce technical variation within and between arrays. Due to the different karyotypes and the effects of X chromosome inactivation, females and males exhibit distinct methylation patterns on sex chromosomes; this poses a significant challenge to normalize sex chromosome data without introducing bias. Currently, existing methods do not provide unbiased solutions to normalize sex chromosome data, usually, they just process autosomal and sex chromosomes indiscriminately. In chapter 2, I first demonstrate that ignoring this sex difference will lead to introducing artificial sex bias, especially for thousands of autosomal CpGs. Then a novel two-step strategy (interpolatedXY) was created to address this issue, which is applicable to all quantile-based normalization methods. Employing this new strategy, the autosomal CpGs are first normalized independently by conventional methods, such as funnorm [1] or dasen[2]; then the corrected methylation values of sex chromosome-linked CpGs are estimated as the weighted average of their nearest neighbors on autosomes. The proposed two-step strategy can also be applied to other non-quantile-based normalization methods, as well as other array-based data types. Despite different tissues having vastly different rates of proliferation, it is still largely unknown whether they age at different rates. It was previously reported that the cerebellum ages slowly, however, this claim was drawn from a single methylation clock using a small sample size and thus warrants further investigation. In chapter 3, I first collected the largest cerebellum DNAm dataset (N=752) and found their respective epigenetic ages were all severely underestimated by six representative DNAm age clocks, with the underestimation effects more pronounced in the four clocks whose training datasets did not include brain-related tissues. Then 613 age-associated CpGs are identified in the cerebellum, which accounts for only 14.5% of the number found in the middle temporal gyrus from the same population (N=404). Subsequently, I built a highly accurate age prediction model for the cerebellum named CerebellumClockspecific (Pearson correlation=0.941, mean absolute deviation=3.18 years). Ageing rate comparisons based on the two tissue-specific clocks constructed on the 201 overlapping age-associated CpGs support the cerebellum has younger DNAm age. Nevertheless, BrainCortexClock is constructed to prove a single DNAm clock is able to unbiasedly estimate DNAm ages of both cerebellum and cerebral cortex when they are adequately and equally represented in the training dataset. In conclusion, comparing ageing rates across tissues using DNA methylation multi-tissue clocks is flawed. The large underestimation of age prediction for cerebellum by previous clocks mainly reflects the improper usage of the age clocks. There exist strong and consistent ageing effects on the cerebellar methylome and we suggest the smaller number of age-associated CpG sites in cerebellum is largely attributed to its extremely low average cell replication rates. In summary, the sex classifier method presented in the first chapter provides a robust and widely applicable tool to identify the sexes of DNAm methylation samples. It can be applied to make sex annotations and identify sex-mismatch samples. The second chapter presents a novel two-step strategy to bypass the issue of introducing artifactual sex bias when normalizing female samples and male samples together by conventional normalization methods. In the last chapter, the unique age-associated methylome change in the cerebellum is revealed and a cerebellum-specific clock is constructed that can accurately predict cerebellum age and it is demonstrated that the comparison of ageing rates across tissues using epigenetic clocks is flawed. These findings have wider implications for the use of ageing clocks.
Item Type: | Thesis (Doctoral) |
---|---|
Divisions: | Faculty of Science and Health > Computer Science and Electronic Engineering, School of |
Depositing User: | Yucheng Wang |
Date Deposited: | 29 Jun 2023 15:47 |
Last Modified: | 29 Jun 2023 15:47 |
URI: | http://repository.essex.ac.uk/id/eprint/35887 |
Available files
Filename: PhD_Thesis_1906806.pdf