ABSTRACT

Key Notes https://www.niso.org/standards/z39-96/ns/oasis-exchange/table">

Overview

Bioinformatics is the collection, storage, analysis, collation, and use of biological information. The data contain mapping and phenotype information, nucleotide and amino acid sequence, but also protein structure, function, and expression data. The data come from cDNA (RNA), local and genomic DNA sequencing, amino acid sequences from polypeptides, and structural information from X-ray diffraction and nuclear magnetic resonance. There are also results of earlier analyses. The objective is to forecast structure and function from sequence data to improve medical treatments and drug design quickly. Sequence data are also very useful for constructing phylogenies and investigating evolutionary relationships.

Databanks

The information is stored logically in databanks which include databases and the tools to access the data. These are accessed via the Internet. Typical questions arise when a new sequence or structure is tested against the database to look for similarities to the deposited data. Information about the matching entries in the database may be applicable to the new sequence or structure. Some databank URLs are in the text.

Techniques of alignment

Algorithms are designed to test for similarities between two sequences (pairwise alignment). In a perfect match the two would align side by side. Amino acid sequences are better than nucleotide sequences. They are more conserved, and with 20 amino acids against only four bases, there are fewer chance matches between amino acids.

The tools used

BLAST (Basic Local Alignment Search Tool) is currently the principal pairwise alignment search program. Multiple alignments are possible, and iterative searches can combine data from different partial matches to identify distant relationships. PSI-BLAST (Position Sensitive Iterated-BLAST) uses matches from pairwise searches to refi ne subsequent searches and can detect alignments between more divergent sequences. Comparison of protein structure can detect great conservation even after extensive evolutionary divergence. DALI (Distance-matrix ALIgnment) can detect structural similarities when very little amino acid identity remains.

Predicting structure

The DNA sequence determines the amino acid sequence (primary structure) of the polypeptide or protein. This in turn infl uences the secondary structure (alpha helix, beta pleated sheet, flexible loops, and turns). These structures fold into the tertiary structure, mainly energetically driven by hydrogen bonding both between amino acid residues and between the amino acids and water molecules. Hydrophilic regions are excluded from solution by the hydrogen bonding between the water molecules and are forced into the center of the protein, or into lipid membranes.

Predicting binding and function

The structure of the protein, and particularly the distribution of its electron cloud, is responsible for the protein's function as a catalyst and in binding to other molecules. Drugs can be designed to interact with specifi c parts of specifi c proteins to produce specifi c effects if pharmacologists have suffi cient knowledge of the protein's structure. Bioinformatics hopes to speed the path from sequence or structural data to the production of effi cient and profi table drugs.

Comparative bioinformatics

The wealth of data available, including mRNA and protein expression, allows comparison between cells and tissues in an organism including cancer and developmental stages, between individuals for population genetics and epidemiological purposes, and between species for evolutionary purposes

Related topics

Concepts of genomics

Prokaryotic genomes

Eukaryote genomes

Phylogeography, molecular clocks, and phylogenies

Using sequence specificity to study nucleic acids

Genetic diseases

Genes and cancer

Biotechnology