ABSTRACT

The research in Chapter 4 identified problems involved when attempting to apply traditional clustering methods to the analysis of full size complementary deoxyribonucleic acid (cDNA) microarray images. These problems included such issues as computational complexity, loss of image information and data blindness. With the application of the Pyramidic Contextual Clustering (PCC) and Image Transformation Engine (ITE) components to the image data, these issues have been largely resolved. That is to say, computational complexity and data loss issues have been significantly reduced in time and severity respectively. The data blindness (no prior domain knowledge) issue was also followed during the execution of these components, with the individual component results capturing an image’s gene spot detail very well. Such detailed gene spot information makes feasible the acquisition of a given image’s true gene block structure. Although neither of the ITE or PCC generated results capture the entire gene spots regions in an image (except in the trivial case of a very clean image surface), their individual results can be combined to significantly enhance the gene spot identification process. As detailed in Section 2.5 a cDNA microarray image consists of several

master blocks propagated across its surface, which can also be seen in Figure 5.2. In an ideal situation, these blocks will all be in perfect alignment with each other in the vertical and horizontal directions, both internally to the block and externally across the slide. There will be no rotation or skew with respect to their global positioning. They will not overlap each other or be so close as to confuse their identification at a later stage. Also, internally the meta-blocks will not suffer from any missing gene spots. If the gene spots are missing due to their intensity being weak, they could be obscured by local background. However, in practice the perfect alignment of blocks within an image is

rarely the case. Master blocks are not only misaligned with respect to each other; they also tend to have rotation issues. In the case where there is large rotation in the surface, these master blocks can be difficult to acquire accurately (depending on the identification process involved). Typically though, these rotations tend to be small and as a result are normally insignificant to

An

downstream analysis. The meta-blocks suffer from similar issues, for example, missing gene spots and background contamination of gene spot regions. The alignment and other related issues highlight the need for designing

a method which will correctly address the gene spots within a microarray image. The method should be comparable to some accepted process for the purposes of validation. Although several approaches could render a practical solution for these problems, without some form of validation step it is difficult to quantify the usefulness of a proposed process. The main aim of addressing as it pertains to microarray imagery is to focus on the feature identification problem which can be partitioned into two distinct processes. The first process involves master block identification and can be thought of as acquiring the overall layout of the image in question. The second process further analyses these master block locations and acquires the meta-block structures or gene spots therein. The proposed solutions are focused on new techniques specifically designed

for the image composition identification tasks. As shown in Figure 5.1, the Image Layout (IL) and Image Structure (IS) components are executed after the ITE (as was discussed in Chapter 3) has created the default views.∗ These components complete the Structure Extrapolation stage and can either complement the work of the PCC component or be used independently of it.