ABSTRACT

This chapter discusses methods for domain prediction and analysis of multi-domain proteins. A domain usually combines several secondary structure elements and motifs, not necessarily contiguous, which are packed in a compact globular structure. Domain prediction is not only one of the important problems in computational biology, it is also one of the more complicated and difficult ones since there is no precise and consistent definition of domains that is widely accepted. Algorithms for domain prediction that are based only on sequence information are very popular since they are relatively fast and provide an appealing solution for large-scale domain prediction of sequence databases. Domain prediction methods may produce an output for each position, reflecting its probability to be a core domain position or a boundary position. A potential improvement in domain prediction accuracy can be achieved by using majority voting between multiple methods.