ABSTRACT

From its beginnings in a small Cambridge pub to the multi-billion dollar industry that exists today, discovering the secret of DNA has to be one of the driving forces behind modern scientific research. Over the years, research into this illusive code has taken many forms. Since the initial mapping and open publication of the human genome, one of the most publicized applications has to be that of microarray technology and its ability to facilitate the monitoring of many thousands of genes simultaneously. Even though the landscape of data analysis has been seen to evolve rapidly

in recent years, the basic desire to capture useful information within the data remains resilient. What was once a stronghold for the statistical community has been thrown to the field; tried and tested methods have failed to adapt to high dimensional, undersampled data whilst the largely applied domain of intelligent data analysis has begun to excel. No single community can now stake a claim to dominance, as it is becoming clearer that the only way forward is through a unity of techniques. Techniques developed for very different purposes, sometimes decades before, need to be rediscovered while novel ideas are developed. The challenge now is in utilizing this sometimes forgotten knowledge in environments and with data that is foreign to their original application, as well as, discovering appropriate techniques from fields that may have been previously disassociated. Only by focusing on their commonality, the interface that will allow the use of a technique designed originally for a completely different purpose, in a way never before envisioned, can the full potential of new vast sums of data be truly realized. This book brings together the disparate fields of image processing, data

analysis and molecular biology, along with many others to a lesser degree such as outlier detection, statistics and signal processing, all of which capture the essence of bioinformatics. Here the focus is on extracting improved results from existing cDNA microarray experiments, a technology that is still early in its development. The majority of researchers in this field rely on very simple techniques for the first stages of analysis and then use a range of statistical techniques to clean and filter the data. Instead, it is proposed that these early stages are the most crucial in terms of accuracy, as any errors found here will propagate down through all later stages of processing and analysis. With this in mind, work that has been conducted in the preprocessing of these early stages and a look at how this may be extended is presented.