ABSTRACT

The prediction of low-complexity regions is not equivalent with predicting disorder. The two are clearly related, however, and low-complexity regions are often associated with non-globular regions of proteins. As detailed in Chapter 10, Section 10.1.2, low complexity is defined by the informational entropy function of Shannon (Shannon 1948), adapted to protein sequences by Wootton (Wootton 1994a, b). Wootton observed that non-globular segments of proteins deviate significantly from the observed random composition of globular proteins/domains, because of the dominance of a few amino acids and/or the repetitive nature of their sequences (see Chapter 10, Figure 10.2). Based on this principle, the SEG program was developed to identify such sequentially biased fragments (Wootton 1994a, b). SEG first calculates local complexity for segments of a given size by the Shannon entropy, extends and merges them, and reduces them to a single optimal low-complexity region. Because low-complexity and disorder are related (Romero et al. 2001), this practice has definite value in delineating non-globular and possibly disordered regions of proteins.