ABSTRACT
The generation of a draft sequence of a human genome has heralded a new age in the
investigation of the contribution of genetic variation to human disease and phenotypes (1).
Quickly following on the scaffold of a draft sequence, there has been a rush to annotate
genetic variation and expression across the genome, some of which has been
accomplished through large international consortial efforts [e.g., the International
HapMap Project or Encyclopedia of DNA Elements (ENCODE)], whereas other smaller
scale programs and countless individual studies have provided detailed analyses of
genetic variation in specific genes or regions of the genome (2-5). In parallel with a
growing appreciation of the unanticipated large scope of genetic variation in the genome,
there has been the commercial development of new technical platforms with fixed content
that can survey thousands of variants at once. This has expanded the opportunities for
analysis, enabling investigation of common germline genetic variation across the genome
in what are now known as genomewide association studies (GWAS) (6). To keep pace
with the generation of dense data sets, new computational and analytical processes
continue to be developed, usually in an effort to solve a problem of efficiency or
excessive computational requirements (Fig. 1).