ABSTRACT

I. INTRODUCTION The discipline of proteomics has evolved around the core separation technologies of two-dimensional gel electrophoresis (2DGE), advanced image analysis, chromatography, capillary electrophoresis, and mass spectrometry. Ward and Humphery-Smith [1] have reviewed the methodologies and bioinformatic procedures employed within the field for protein characterization. There are numerous shortcomings associated with these procedures (see later, pg. 7); however, 2DGE currently remains unsurpassed in its ability to resolve complex mixtures of proteins (for examples, see Figs. 1 and 2). The question remains, however, as to whether or not these very same technologies (traditional proteomics) or variants thereof are capable of scaling to allow meaningful analyses of human tissues in health and disease across multiple organ systems and for large patient cohorts. Based on lessons learned with what until recently was the most complete proteome [2], namely traditional proteome analysis of the smallest living organism, the bacterium Mycoplasma genitalium, the answer is clearly no. The difficulties encountered for such a small project simply do not scale to the analysis of numerous human proteomes. Thus, the above technologies need to be complemented by alternate array-based or second-generation approaches (i.e., analytical procedures conducted independently of the separation sciences) (cf. Ref 3, for definition). Array-based procedures are most likely to become the tool of choice for initial target discovery, whereby large sets of patient material will need to be examined so as to acquire the necessary statistical significance necessary for the understanding of multigenic phenomena (Fig. 3). The latter are expected to represent the greater part ( 95%) of all human aliments, as opposed to monogenic disorders (e.g. the catalog of Mendelian inheritance in man) [4]. Nonethe-

less, rather than becoming obsolete, the need for traditional proteomics is expected to become increasingly important in defining the nature and location of co-translational and posttranslational modifications found on molecules in health and disease. Over recent years, protein characterization has become increasingly rapid and reliable, but has yet to be practiced on a scale akin to the throughputs achievable in genetic analysis of either DNA or mRNA. This is particularly relevant when one considers the enormity of the task at hand (i.e., the multitude of protein isoforms likely to be encountered within the human proteome). To date, little of the human proteome has either been observed or characterized, if one considers an estimated 300,000 to 500,000 expected elements awaiting discovery. This number is based on the gene content of the human genome lying somewhere between 30,000 and 50,000 open reading frames (ORFs) [5,6] and the observations of Langen (personal communication), whereby an average of 10 isoforms were observed per protein following matrix-assisted laser desorption/ionization-time-of-flight (MALDI-TOF) mass-spectrometric analysis of approximately 150,000 high-abundance human proteins derived from 2D gels. Notably, scientists from Oxford GlycoScience (Ch. Rohlff, personal communication) have suggested the number may only represent a multiple of five times the number of human ORFs based on their large-scale studies of human proteins. It is likely that most, if not all, human protein gene products will possess one to several cotranslational and/or posttranslational modifications (PTMs). Apart from PTMs, differential splicing and protein cleavage contribute to the variety of protein gene products able to exist as isoforms, be they amidated, glycosylated, phosphorylated, myristolated, acetylated, palmitoylated, and so forth. Humphery-Smith and Ward [1] have summarized the more commonly occurring PTMs seen in mammalian systems. Extremes include the potential to produce dozens of different protein isoforms from individual exon-rich ORFs as a result of differential splicing. Extremes here include the titin gene [7].