ABSTRACT

A number of problems in computational biology can be cast as attempting to identify

instigations of some known sequence feature. As the most simple example, we may

seek to find locations of start codons in a sequence of DNA. Generally when given

just a sequenced small part of a chromosome, one doesn’t know the reading frame

or on which strand to expect the start codon. Recall, the reading frame of a DNA se-

quence indicates which nucleotide is the first nucleotide of a codon, hence just given

a sequence of DNA there are 3 possible reading frames for each of the 2 strands. Re-

call that which end of a DNA sequence is the 3’ end and which is the 5’ end is known

from the sequencing reaction since the sequence is determined by termination of a

growing chain of nucleotides (see section 3.1). Hence if we are given the following

sequence of nucleotides from a sequencing experiment

AACAAGCGAA TAGTTTTGTT

we have the following 3 sets of codons (and partial codons) that are possible on this

strand

AAC AAG CGA ATA GTT TTG TT

A ACA AGC GAA TAG TTT TGT T

AA CAA GCG AAT AGT TTT GTT

whereas the other strand would have the following nucleotides (recalling that the

3’-5’ orientation of the other strand runs in the reverse direction)

AACAAAACTA TTCGCTTGTT

we get the following set of possible codons

AAC AAA ACT ATT CGC TTG TT

A ACA AAA CTA TTC GCT TGT T

AA CAA AAC TAT TCG CTT GTT.