ABSTRACT

The basis for sequence alignment is that there is a common ancestor for the sequences

under investigation. We derive each of the 2 sequences under comparison by a se-

quence of mutations, insertions and deletions (insertions and deletions are referred

to as indels). The process of obtaining a sequence from another by these basic opera-

tions works forward and backward in time, i.e. a deletion forward in time is an inser-

tion if one reverses time. Hence, if y0 is the common progenitor of the 2 sequences, y1 and y2, our evolutionary model postulates that y1 = Ms(Ms−1(· · · (y0) · · · ) and y2 = Mt(Mt−1(· · · (y0) · · · ), where Ms represents a mutation or indel. But since mutations and indels are time reversible, we can write

y1 = fJ(fJ−1(· · · (y2) · · · ), for some sequence of mutations and indels, fj . Assuming all fj have the same impact on the function of the protein (which is not realistic but makes the statement of

the problem most simple), the hypothesis that the 2 sequences are not functionally

related isH0 : J ≥ J0, where J0 is large enough so that the protein has lost its function. Unfortunately, even with the simplifying assumption regarding equal impact on

functionality for all fj , little is known about J0 and this approach is not employed in practice.