ABSTRACT

If the draft of the human genome sequence marks the establishment of genomic technologies in biology, it also ushers in the analysis of human genetic variation in unprecedented detail (1-3). Even as these studies have begun, common genetic variation in humans (> 1 % allele frequency) is already recognized as taking the predominant form of single-nucleotide polymorphisms (SNPs) and one large collaborative initiative has already identified millions of these small elements of genetic diversity (4, 5). While SNPs in any part of the genome can potentially affect biological function, it seems likely that the ones with the most profound effects will map to the coding sequences of genes and to the sequences that control gene expression, e.g., promoters, enhancers, and sequences important for pre-mRNA splicing. Of the millions of SNPs in the population, perhaps 250,000-400,000 map to coding sequences (6, 7). Both synonymous cSNPs, which do not alter the amino acid sequence of proteins, and nonsynonymous cSNPs (nsSNPs), which do, can have effects on biological function, but genetic arguments favor greater functional consequences for nsSNPs on average (6-8). This difference is due to the more direct impact of nsSNPs on protein stability and biological activity. In turn, nsSNPs are obvious candidates for analysis in the study of fundamental processes in human evolution, of the genetic basis of disease, and-in applied research-of the influence of genetics on drug response, also known as pharmacogenomics (9). A major challenge for research in the post-human genome sequence world is to find links between SNPs and their effects on biological function as rapidly and accurately as possible.