ABSTRACT

Although microarray technology [186] was invented in the mid-1990s the technology is still widely used in laboratories around the world today. The microarray “gene chip” contains probes for an organism’s entire transcriptome where differing cell lines render gene lists with appropriate activation levels. Gene lists can be analyzed with application of various computational techniques, be they clustering [70] or modeling [119] for example such that the differential expressions can be translated into a clearer understanding of the underlying biological phenomena present. For a detailed explanation of the micro arraying process readers may find references [47, 67, 133,166] of interest. Addressing the issue of microarray data quality effectively is a major chal-

lenge, particularly when dealing with real-world data, as “cracks” will appear regardless of the design specifications etc. These cracks can take many forms, ranging from common artifacts such as hair, dust, and scratches on the slide, to technical errors like miscalculation of gene expression due to alignment issues or random variation in scanning laser intensity. Alongside these errors, there exists a host of biological related artifacts such as contamination of the complementary Deoxyribonucleic Acid (cDNA) solution or inconsistent hybridization of multiple samples. The focus in the microarray field therefore is on analyzing the gene expression ratios themselves [44, 70, 92, 120,167,168] as rendered from the image sets. This means there is relatively little work directed at improving the original images [88, 89, 157,227] such that final expressions are more realistic. As noise in the images has a negative effect with respect to the correct iden-

tification and quantification of underlying genes, in this chapter we present an algorithm that attempts to remove the biological experiment (or gene spots) from the image. In the microarray field, it is accepted as part of the analysis methodology that the background domain (non-gene spot pixels) infringes on the gene’s valid measure and steps must be taken to remove these inconsistencies. In effect, this removal process is equivalent to background reconstruction and should therefore produce an image which resembles the “ideal” back-

An

ground more closely in experimental (gene spot) regions. Subtracting this new background image from the original should in-turn yield more accurate gene spot expression values. The gene expression results of the proposed reconstruction process are contrasted to those as produced by GenePix [14] (a commercial system commonly used by biologists to analyze images). Results are also compared with three (3) of the aforementioned reconstruction approaches (O’Neill et al. [157], Fraser et al. [88,89]) with respect to like-for-like techniques. The chapter is organised in the following manner. First, we formalize the

problem area as it pertains to microarray image data and briefly explain the workings of contemporary approaches in Section 9.2. Section 9.3 discusses the fundamental idea of our approach with the appropriate steps involved in the analysis highlighted. We then briefly describe the data used throughout the work and evaluate the tests carried out over both synthetic and real-world data in Section 9.4. Section 9.5 summarizes our findings and renders some observations into possible future directions.