Research Repository

CES-483 Evolving Regular Expressions for GeneChip Probe Performance Prediction

Langdon, WB and Harrison, AP (2008) CES-483 Evolving Regular Expressions for GeneChip Probe Performance Prediction. UNSPECIFIED. CES-483, University of Essex, Colchester.


Download (346kB) | Preview


Commercial GeneChips provide highly redundant but noisy data. Rapid identification and subsequent rejection of bad data effectively increases the quality of the remaining data at little cost whilst serving as a basis for better understanding the bio-physics of short surface mounted DNA sequences. Affymetrix High Density Oligonuclotide Arrays (HDONA) simultaneously measure expression of thousands of genes using millions of probes. Regular expressions can be evolved from a Backus-Naur form (BNF) context-free grammar using tree based strongly typed genetic programming written in gawk. Fitness is given by egrep. The quality of individual HG-U133A probes is indicated by its correlation across 6685 human tissue samples from NCBI?s GEO database with other measurements for the same gene. Low concordance indicates a poor probe. The evolved data mined motif is better at predicting poor DNA sequences than an existing human generated RE, suggesting runs of Cytosine and Guanine and mixtures should all be avoided. Section 4.6 gives more RE GP gawk implementation details.

Item Type: Monograph (UNSPECIFIED)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Science and Health
Faculty of Science and Health > Mathematical Sciences, Department of
SWORD Depositor: Elements
Depositing User: Elements
Date Deposited: 08 Jul 2015 11:38
Last Modified: 06 Jan 2022 13:36

Actions (login required)

View Item View Item