ABSTRACT

Algorithm ..........................................................................................80 4.6 Ending Notes ................................................................................................ 82 4.7 References ..................................................................................................... 82

Alignment of biological sequences enables discovery of evolutionary and functional relationships among them. Computing alignments, as a means to elucidate different kinds of sequence relationships, is a fundamental tool arising in numerous contexts and applications in computational biology. A number of algorithms for sequence alignments have been developed in the past few decades, the most common being the ones for pairwise global alignments (aligning sequences in their entirety [1]) and local alignments (aligning sequences that each contain a substring that is highly similar [2]). Some applications require more complex types of alignments. One such example is when aligning an mRNA sequence transcribed from a eukaryotic gene with the corresponding genomic sequence to infer the gene structure [3]. A gene consists of alternating regions called exons and introns, while the transcribed mRNA corresponds to a concatenated sequence of the exons. This requires identifying a partition of the mRNA sequence into consecutive substrings (the exons) that align to the same number of ordered, nonoverlapping, nonconsecutive substrings of the gene, a problem known as spliced alignment. Another important problem is that of syntenic alignment [4], for aligning two sequences that contain conserved substrings that occur in the same order (such as genes with conserved exons from different organisms, or long syntenic regions between genomes of two organisms). An illustration of these four kinds of alignments, showing how the regions of two sequences are aligned, is given in Figure 4.1.