ABSTRACT

Genome-wide association studies take advantage of the ease with which DNA sequences for large numbers of individuals from different populations can be obtained. They use single-nucleotide polymorphism (SNP) arrays to detect SNPs associated with disease. The coalescent addresses the time for an allele to coalesce and the variation in populations under drift. It explores a large number and broadly representative sample of plausible genealogical scenarios. High-quality DNA sequence data from a random sample constitute the best input for a coalescence analysis. Importance sampling or correlated sampling can be used to generate a collection of simulated genealogies. If a locus is under intense positive selection, loci closely linked to it will also be affected; these loci are called hitchhiking loci. The hitchhiking loci can be neutral or even detrimental, and yet they will behave as if positive Darwinian selection is acting upon them. Selective hard sweeps can be detected by the extended haplotype approach, the site frequency spectrum (SFS) approach, by subdivisioning, and by reduced variability in regions of the genome. Soft sweeps are more difficult to detect. Phylogenetic shadowing identifies regulatory elements in DNA sequences. Regions of the human genome that have experienced accelerated evolution can be detected and compared. Regions that are both strongly conserved and rapidly deleted are also of interest.