ABSTRACT
A number of problems in computational biology can be cast as attempting to identify
instigations of some known sequence feature. As the most simple example, we may
seek to find locations of start codons in a sequence of DNA. Generally when given
just a sequenced small part of a chromosome, one doesn’t know the reading frame
or on which strand to expect the start codon. Recall, the reading frame of a DNA se-
quence indicates which nucleotide is the first nucleotide of a codon, hence just given
a sequence of DNA there are 3 possible reading frames for each of the 2 strands. Re-
call that which end of a DNA sequence is the 3’ end and which is the 5’ end is known
from the sequencing reaction since the sequence is determined by termination of a
growing chain of nucleotides (see section 3.1). Hence if we are given the following
sequence of nucleotides from a sequencing experiment
AACAAGCGAA TAGTTTTGTT
we have the following 3 sets of codons (and partial codons) that are possible on this
strand
AAC AAG CGA ATA GTT TTG TT
A ACA AGC GAA TAG TTT TGT T
AA CAA GCG AAT AGT TTT GTT
whereas the other strand would have the following nucleotides (recalling that the
3’-5’ orientation of the other strand runs in the reverse direction)
AACAAAACTA TTCGCTTGTT
we get the following set of possible codons
AAC AAA ACT ATT CGC TTG TT
A ACA AAA CTA TTC GCT TGT T
AA CAA AAC TAT TCG CTT GTT.