ABSTRACT

This chapter describes methods for recognizing domain folds within multidomain structures. It considers methods for clustering proteins structures into fold groups and evolutionary related families. The chapter reviews hierarchical classifications of protein structures together with the statistics of structural classifications, i.e. population of homologous superfamilies, fold groups and protein architectures. It also describes use of protein structure classifications to benchmark sequence comparison methods. A large proportion of structures in the Protein Databank (PDB) are nearly identical, corresponding to single-residue mutants, which have been determined to establish the effect of a mutation on the structure and function of the protein. The concept of domain recurrence can also be used to validate domain boundaries within multidomain proteins. About 70% of domain structures within multidomain proteins recur in different domain contexts or as single-domain proteins. Most of the larger classifications describe similar hierarchical levels in their structural classifications, corresponding largely to phylogenetic and phonetic, that is purely structural, similarities between the data.