Research Repository

CES-483 Evolving Regular Expressions for GeneChip Probe Performance Prediction

Langdon, WB and Harrison, AP (2008) CES-483 Evolving Regular Expressions for GeneChip Probe Performance Prediction. UNSPECIFIED. CES-483, University of Essex, Colchester.

[img]
Preview
Text
CES-483.pdf

Download (346kB) | Preview

Abstract

Commercial GeneChips provide highly redundant but noisy data. Rapid identification and subsequent rejection of bad data effectively increases the quality of the remaining data at little cost whilst serving as a basis for better understanding the bio-physics of short surface mounted DNA sequences. Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. Regular expressions can be evolved from a Backus-Naur form (BNF) context-free grammar using tree based strongly typed genetic programming written in gawk. Fitness is given by egrep. The quality of individual HG-U133A probes is indicated by its correlation across 6685 human tissue samples from NCBI?s GEO database with other measurements for the same gene. Low concordance indicates a poor probe. The evolved data mined motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. Section 4.6 gives more RE GP gawk implementation details.

Item Type: Monograph (UNSPECIFIED)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health > Mathematical Sciences, Department of
Depositing User: Carla Watkins
Date Deposited: 08 Jul 2015 11:38
Last Modified: 17 Aug 2017 17:35
URI: http://repository.essex.ac.uk/id/eprint/14258

Actions (login required)

View Item View Item