ABSTRACT

Molecular evolution began to be studied in the 1960s when a few protein sequences were available, notably cytochrome c and hemoglobin. These last sequences were known for a variety of organisms and, under the assumption that closely related organisms have similar sequences, family trees were constructed for these sequences. Biology provides the motivation for aligning sequences and for considering how difficult alignment is. It is then a mathematical task to estimate the number of sequence alignments. What remains for defining alignment from trace is to specify the order of the deletions. Weight of the arcs must depend on the inserted/deleted letters as well as the visited nodes. Deletion of both sequences corresponds to a single arc from source to sink. In some applications, the weights given to substitutions are dependent on the positions in the sequences. It is routine to generalize our algorithms to fit this general setting.