ABSTRACT

Key Notes https://www.niso.org/standards/z39-96/ns/oasis-exchange/table">

DNA content, C-value paradox

The genome is the total of the nuclear DNA in a gamete. The genomes of eukaryotes vary greatly in the amount of nuclear DNA, but the quantity of DNA is not related to the number of genes. Much of the extra DNA is repeats of sequences which appear to be parasitic ‘junk’ DNA.

The human genome

The human genome is 3 billion base pairs long and contains approximately 20 000 genes arranged on 23 chromosomes. Less than 1.5% of the DNA codes for amino acids. Generelated sequences including pseudogenes, introns, and control regions account for 25% of the DNA. The rest is extragenic DNA.

Genes

Single-celled eukaryotes have about 6000 genes; multicellular ones have 13 000 to 26 000. The coding information in eukaryote genes occurs as a series of exons separated by noncoding introns. Genes vary greatly in size and also with respect to the number and sizes of the introns. Leader and trailer sequences occur at the 5′ and 3′ ends of genes; these are transcribed but not translated. Upstream promoter sequences regulate gene transcription.

Gene families

Many genes occur as families containing multiple copies of genes with identical or related sequences. The genes in a family may be present at single or multiple loci. Gene families may also occur as individual clusters at multiple loci.

Pseudogenes

These are diverged members of gene families that have acquired one or more inactivating mutations. Processed pseudogenes are nontranscribed DNA copies of mRNAs, probably derived by a mechanism involving reverse transcription. Gene fragments are inactive genes that lack part of the parent gene. They are thought to have arisen by deletion or recombination of the original gene sequence.

Extragenic DNA

This is composed of sequences that are not genes, generelated sequences, or pseudogenes and accounts for about 75% of the human genome. Most extragenic sequences (70–80%) are unique or exist as a small number of copies. The rest (20–30%) are moderately or highly repeated sequences present as tandem arrays or dispersed throughout the genome. Extragenic DNA has no known function.

Dispersed repetititve sequence

These consist of SINEs and LINEs (short and long interspersed nuclear elements, respectively). SINEs include human Alu sequences. These are a family of sequences about 250 bp long present as about 1 million highly dispersed copies. They are thought to be derived from processed pseudogenes that acquired the ability to move about the genome. LINEs are longer than SINEs. The human L1 LINE is 6500 bp and exists as 60 000 copies. LINEs are retroelements and have the ability to copy themselves using reverse transcriptase and to move about the genome.

Clustered repetitive sequences

Larger eukaryote genomes have extensive regions containing long tandem arrays of repetitive sequences. These are called satellite DNA. Short units are classified as micro (less than seven bases per repeat), or mini (seven to about 25 bases); the rest are just called satellite DNA and individual units may be thousands of base pairs long, usually with smaller repeats inside them. CA dinucleotide repeats and mononucleotide repeats account for 0.8% of the entire human genome.

Variable number tandem repeats (VNTRs)

VNTRs are repetitive sequences that vary according to the number of times the repeated sequence is present. Variation occurs at a given locus between individuals. Polymerase chain reaction (PCR) can be used to detect the variations. VNTRs are used in forensic science to identify individuals at the scene of a crime and in medical genetics to identify carriers of genetic diseases.

Related topics

Genes

Concepts of genomics

Genetic diseases

Genetics in forensic science