ABSTRACT

Sequences play a major role in biology as a means of abstraction. For example deoxyribonucleic acid (DNA), the carrier of genetic information in the nucleus, as well as proteins, a main ingredient of the cell responsible for most biological activity, can be represented as sequences over an alphabet of four, respectively twenty characters. This is due to the fact that those molecules are biopolymers, large organic molecules assembled from small building blocks called monomers, which are all of the same kind and linked together to long chains. The monomers of nucleic acids like DNA or RNA (ribonucleic acid) are nucleotides, and each nucleotide contains one out of four possible nucleobases. The structure of a nucleic acid strand is therefore defined by the actual sequence of bases in its nucleotides. Proteins on the other hand are composed of amino acids. In natural proteins, twenty different kinds of amino acids occur. They all have a phosphate backbone and differ in their residues. In proteins, these amino acids may occur in any order and number. We call the information about the succession of the monomers in a nucleic acid and protein its biological sequence, and thus we consider these biopolymers a kind of storage for this information. Many functions which are fulfilled by biopolymers like nucleic acids and proteins depend on their sequence composition. A DNA sequence for example encodes genes, which are construction plans for proteins. The cell first transcribes the genes into messenger RNA (mRNA), which is then, after some modifications, translated into a peptide, where every three nucleobases form a codon that corresponds to one specific amino acid in

4 ++

the synthesized protein. The sequence of nucleotides in the DNA therefore defines the order of amino acids in the protein, which further specifies the three-dimensional shape the protein folds into. RNA may also fold into a structure that is crucial to fulfill its purposes in the cell. Moreover the degree of molecular binding between proteins and nucleic acids depends on their sequences; the protein synthesis for example involves certain proteins that can dock only on specific patterns in the DNA.