ABSTRACT

All genes in contemporary organisms on earth have relatives that originated in a common ancestor. The genes or gene products (proteins and RNAs) derived from a common ancestor are said to be homologous to one another, and comprise a family or a superfamily. The homologous relationships may also apply to individual nucleotides or amino acids (collectively called residues) in a family of genes or proteins. Multiple sequence alignment (MSA) is aimed at reproducing these homologous relationships among individual residues in a set of gene or protein sequences. Mutations (substitutions, deletions, or insertions) that have occurred during the evolutionary process make the inter-residue homologous relationships obscure and sometimes barely detectable. Hence, it is vital to obtain a reliable MSA from a set of remotely related sequences. On the other hand, the sequence data to be analyzed are accumulating at an increasing rate, and therefore, fast and reliable MSA methods are earnestly desired. The main purpose of this chapter is to introduce a variety of computational methods to tackle this difficult problem. We tried to cover both theoretical and practical approaches. However, because the relevant area is too wide to be entirely covered in this chapter, we concentrate most of our attention on the global alignment of protein sequences. Some recent reviews and book chapters on these topics, the reader is referred to [9, 16, 17, 34, 76, 72].