ABSTRACT

Accordingly, an ambitious project called structural genomics was initiated, aimed at determining the structures of all proteins coded by the human genome, including those of unknown function.[1] e initiative has been (and is still being) carried out by several organizations, including the American National Institute of Health (NIH).[2] Determining the structures of all human proteins in a period of several years seems unrealistic, as the number of these proteins is huge and the process is long and costly (see more below). erefore, it has been decided to determine the structure of a representative set of proteins, that is proteins, the folds of which represent the complete “fold space” observed in Nature. is decision was based on the premise that the number of unique folds in Nature is much smaller than the number of proteins. Although the exact number of folds is yet to be determined (see Chapter 2 for estimates), it is known that proteins tend to converge to similar

210 

structures, which supports the assumption. A®er the structures of all proteins included in the chosen set are determined, the structures of the rest can be predicted computationally, based on sequence similarity (see Section 3.4.3 for details on homology-based modeling). [3] Indeed, many structural scientists agree with the underlying assumption that the entire protein fold space can be represented by a set of distinct structures.[4] However, there are others who believe that this space is more like a continuum.[5,6]

Macromolecules in general can be studied using di˜erent methods, some of which emerged as early as the beginning of the twentieth century. However, there are only a handful of methods used for the determination of the full (or major parts of) protein structures. ese can be roughly separated into two groups; the –rst includes methods that are based on the di˜raction or scattering of either subatomic particles or electromagnetic waves. e second group includes spectroscopic methods, which rely on changes in the energy states of protein atoms that result from their interactions with electromagnetic radiation of di˜erent frequencies. e –rst part of this chapter reviews these methods. Since this is a very broad topic, the discussion focuses on the principles of the methods, as well as on their main advantages and disadvantages. e second part reviews computational methods, some of which are used for protein structure prediction, whereas others are used as a means of optimizing experimentally determined structures.