ABSTRACT
While the need for comparing 2 sequences is clear, there are circumstances in which
one wants to compare more than one sequence in order to construct the alignment.
For example, we may have the sequence of amino acids from a set of instances of a
particular protein from several organisms (i.e. a set of orthologuous sequences) and
we would like to determine what subsequences of amino acids are common across
all the organisms for which we have data (e.g. if we had the sequence of IL-7 from
chimpanzees we would like to align the human, mouse and chimpanzee versions of
the protein). The subsequences that are conserved across species are likely to be the
parts of the protein that must be conserved in the face of evolutionary pressure for
the protein to retain its function. Hence a multiple alignment can help identify such
regions. In addition, the 3 dimensional shape of a folded protein is in part governed by
the local interactions between the residues that make up the peptide sequence. Thus
we expect that the presence of certain subsequences would impact the 3 dimensional
structure of the protein, and the ability to discover such common subsequences may
help towards the goal of computational prediction of protein structures.