ABSTRACT

While the need for comparing 2 sequences is clear, there are circumstances in which

one wants to compare more than one sequence in order to construct the alignment.

For example, we may have the sequence of amino acids from a set of instances of a

particular protein from several organisms (i.e. a set of orthologuous sequences) and

we would like to determine what subsequences of amino acids are common across

all the organisms for which we have data (e.g. if we had the sequence of IL-7 from

chimpanzees we would like to align the human, mouse and chimpanzee versions of

the protein). The subsequences that are conserved across species are likely to be the

parts of the protein that must be conserved in the face of evolutionary pressure for

the protein to retain its function. Hence a multiple alignment can help identify such

regions. In addition, the 3 dimensional shape of a folded protein is in part governed by

the local interactions between the residues that make up the peptide sequence. Thus

we expect that the presence of certain subsequences would impact the 3 dimensional

structure of the protein, and the ability to discover such common subsequences may

help towards the goal of computational prediction of protein structures.