Statistical Approaches to Analysis of Polymorphisms in Multifactorial

ABSTRACT

The vast majority (99.9%) of the ca. 3 billion nucleotide long, recently deciphered human genome sequence, is identical between individuals. The remaining 0.1% is responsible for the genetic diversity between individuals. The potential sources of this diversity include:

Single nucleotide polymorphism or SNP: differing in one nucleotide, e.g. C/A Indel: insertion or deletion of one or a few nucleotides, e.g. TTA/ Repeat polymorphism: differing in the number of times a basic motif

of two to five (microsatellite repeat) or several tens (minisatellite, variable number tandem repeat or VNTR) of nucleotides is repeated, e.g. (CA)12/(CA)13/(CA)14

Structural variation: deletions, duplications, or inversions of up to hundreds of kilobases of sequence1-4

SNPs are usually binary and lend themselves well to accurate and automated highthroughput genotyping, as described in Chapter 6. Moreover, they are widely distributed

throughout the genome. When comparing two chromosome copies randomly selected from a population, the likelihood of a nucleotide position being different is 7.1-7.5 104, corresponding to one variation every 1300-1400 nucleotides.5,6 Some of these differences will be more or less specific to an individual or family (rare variants) whereas others will be observed in many other individuals (common variants). The total number of common SNPs, i.e. SNPs with both alleles having a frequency of41%, in the human population is expected to be 10-15 million.5,6

The whole of the human genetic diversity contributes, together with environmental factors, to the wide range of phenotypic variation observed between individuals, including differences in susceptibility to multifactorial disorders, disease course, or response to treatment. The goal of genetics is to pinpoint those DNA variants that contribute most significantly to the population variation in each trait. Whereas it is currently not feasible to test all genetic variants directly, the human genome itself offers an alternative, indirect strategy based on the phenomenon that variants close to each other on the same chromosome tend not to behave independently of each other. An assayed variant may therefore act as a surrogate marker and ‘‘mark’’ the presence of an unassayed trait-contributing variant nearby, with which it is correlated, i.e. in linkage and possibly linkage disequilibrium (Figure 3.1). The degree of linkage and linkage disequilibrium between the marker and trait locus are pivotal for the success of the indirect testing strategy, and the recent dramatic advances in technology and in knowledge of the human genome are beneficial in this context. Biological factors typical of complex disorders, such as low penetrance or small effect sizes, the influence of several other genes and environmental factors, and heterogeneity among individuals with the trait, obscure however to some extent the relation between variant and trait and continue to make the unraveling of these traits challenging, in contrast to the consecutive successes seen for Mendelian disorders (Figure 3.1).

Statistical Approaches to Analysis of Polymorphisms in Multifactorial Conditions

ABSTRACT