ABSTRACT

With the release of two drafts of the human genome sequence [1,2] and its completion expected next year, the Human Genome Project has gradually shifted into what is often referred to as the Human Proteome Project (HPP). Among the longterm goals of HPP, it will be important to identify for humans and a few model organisms: (1) the majority of expressed proteins, (2) their individual posttranslational modifications, and (3) most of the macromolecules with which they interact; or collectively, create a comprehensive description of the corresponding proteomes. The already available genome sequences will be crucial to accomplish these tasks. Indeed, protein-encoding open reading frames (ORFs), spanning from the initiation codon to the stop codon, can be inferred using genome annotation algorithms [3,4]. This information is currently used in two ways. In ‘forward proteomics,’ predicted ORFs serve as guide for the identification of endogenous proteins purified from cellular extracts, whereas in ‘reverse proteomics’ [5], cloned ORFs are used to express proteins in heterologous and/or exogenous systems (Fig. 1). In this chapter, we will focus on some of the challenges and strategies of reverse proteomics. In particular, we will discuss the challenge of cloning (nearly) complete sets of ORFs, or ‘ORFeomes,’ and using such cloned ORFeomes to generate protein-protein interaction maps.