ABSTRACT

The area of classification also known as pattern recognition and supervised learning has grown substantially over the last twenty years, primarily due to the availability of increased computing power, necessary for executing sophisticated algorithms. The need for classification arises in most scientific fields, ranging from disease diagnosis, to classifying galaxies by shape, to text and image classification, to applications in the financial industry, such as deciding which customers are good credit risks or constructing efficient portfolios, just to name a few. In bioinformatics, examples of classification tasks include classification of samples to different diseases based on gene and or protein expression data, prediction of protein secondary structure and identification and assignment of spectra to peptides and proteins obtained from mass spectrometry. It should be noted that the emergence of genomic and proteomic technologies, such as cDNA microarrays and high density oligonucleotide chips and antibody arrays, gave a big impetus to the field, but also introduced a number of technical challenges, since the availability of more features (variables) than samples represented a shift in the classical paradigm.