ABSTRACT

Current proteomic investigations are able to generate large amounts of data for a relatively small number of samples representing different classes. These classes can represent diseased versus non-diseased patients or tumor cells from different organs. Computationally these data sets can be used to classify the samples. This chapter outlines some of the available characterization procedures. The emphasis is to show

that, because of the over-determined nature of the data sets, it is very easy to numerically separate one class of samples from another, but creating a biologically realistic classification model is much harder. This requires choosing a small number of relevant features from the data set and using them to build the classification model. Described here are methods for scaling the data set, searching for outliers, choosing relevant features, building classification models, and then determining the characteristics of the models.