ABSTRACT

The generation of a draft sequence of a human genome has heralded a new age in the

investigation of the contribution of genetic variation to human disease and phenotypes (1).

Quickly following on the scaffold of a draft sequence, there has been a rush to annotate

genetic variation and expression across the genome, some of which has been

accomplished through large international consortial efforts [e.g., the International

HapMap Project or Encyclopedia of DNA Elements (ENCODE)], whereas other smaller

scale programs and countless individual studies have provided detailed analyses of

genetic variation in specific genes or regions of the genome (2-5). In parallel with a

growing appreciation of the unanticipated large scope of genetic variation in the genome,

there has been the commercial development of new technical platforms with fixed content

that can survey thousands of variants at once. This has expanded the opportunities for

analysis, enabling investigation of common germline genetic variation across the genome

in what are now known as genomewide association studies (GWAS) (6). To keep pace

with the generation of dense data sets, new computational and analytical processes

continue to be developed, usually in an effort to solve a problem of efficiency or

excessive computational requirements (Fig. 1).