ABSTRACT

Program of Genes and Diseases, Center for Genomic Regulation, Barcelona, Spain

1. INTRODUCTION

In the past few years there has been significant progress towards completing the sequences and beginning to characterize the content of the genomes of several mammalian species. The most notable advances have been made in the study of the human genome. The availability of a near-finished reference DNA sequence has been most crucial in clinical genetic-and genomicinvestigations because for the first time it provides a common template for comprehensive comparative studies aimed at cataloguing genotypes and their influence on phenotypic outcomes. Although much progress has been made in this endeavor (2000 disease genes or associated variants identified) there are still some 4000 genetic diseases for which the molecular etiology is unknown. There are also numerous phenotypic traits in the apparently ‘‘healthy’’ population that can have a strong genetic component; one important example being genetic factors affecting drug metabolism. Genetic

variation in the human genome has, until recently, mainly been studied either at the single nucleotide-or the karyotypic-level. The most common class of variation is single nucleotide (nt) substitutions. These mostly benign changes are now well studied with an estimated 11 million single-nucleotide polymorphisms (SNPs) currently described in the human population. Most of the Human Genome Project-coordinated endeavors to catalogue common variants in the genome sequence have been focused on SNPs. Small insertions and deletions are also usually grouped into this category. Genomic variation detected by karyotyping would include larger tracts of usually contiguous DNA that can vary in copy number (deletion and duplication), distribution (translocations and insertions), or orientation (inversions) along the chromosomes. In most cases these large genomic rearrangements are associated with clinical outcomes. To date, there has not been an exhaustive assessment of the frequency, extent, or distribution of variants in the kilobase (kb) to megabase (Mb) size range, mainly due to lack of robust genome scanning technologies available for this resolution of analysis. However, to partially address this problem there have been recent technical advances that capitalize on the genome sequence as a reference substrate allowing rapid assessment of gains or losses of sequences along chromosomes. As the data begin to accumulate it is becoming increasingly apparent that these ‘‘so-called’’ large-scale copy number variants (or LCVs), often averaging hundreds of kb in size, are present in the genomes of apparently healthy individuals at a much higher frequency than originally thought. In many cases these genomic variants partially or entirely encompass genes, which can affect their copy number. Moreover, in some cases they overlap with nearly identical segmentally duplicated DNA (called low-copy repeats or duplicons). Given that segmental duplications (and possibly LCVs) are implicated in a growing list of over 30 human diseases, which arise due to a gain, loss, or disruption of dosage sensitive genes or regulatory regions these new observations may also be relevant to other unresolved genetic diseases. In this chapter we will describe three different categories of genomic variation and discuss how each type of variation may influence human disease. The different types of variation are outlined in Figure 1. We will describe what are known about genomic disorders and the mechanisms that cause them. We will then discuss how similar molecular events may underlie certain phenotypic variation and susceptibility to common complex diseases as well as influence the dynamic structuring of the human genome.