ABSTRACT

An important feature of protein folding is that the amino acid sequence of the protein uniquely determines its overall structure [2], which is a combination of secondary structure (the regions of α-helix and β-sheet) and tertiary structure (the overall folding pattern). Differences in sequences give rise to differences in secondary and tertiary structure. So far the three-dimensional structures of approximately 6000 proteins have been determined by X-ray crystallography and NMR spectroscopy. The domains in these proteins can be grouped into approximately 350 families of folds, which consist of sequences that have similar structures [3]. It has been estimated that the total number of different folds is only on the order of 1000 [3-5]. This number is much smaller than the total number of different sequences in the human genome, which is on the order of 100,000. Some of these folds are observed in a large number of sequences, whereas others have been found, so far, in

only a small number of instances. The frequency with which a fold occurs is probably related to the stability of the fold or to the speed of the folding process.