ABSTRACT

Most of the chronic diseases occur as a result of adverse effects at multiple points on behavioral, biochemical, genetic, and physiological systems, often from multiple exposures and across various life stages. There is also a tremendous interindividual variability in the health outcome in response to environmental and

stochastic factors. This has hindered the ability of the scientific community to pinpoint why certain individuals develop a chronic disease when exposed to environmental and stochastic factors, while others remain healthy. Recent advances in bioinformatics and molecular genetic tools have provided an opportunity to understand how the genetic and epigenetic variability of an individual interacts with environmental and stochastic factors to either preserve health or cause disease. Emerging consensus indicates that susceptibility to many complex diseases is influenced by the interactions of unique inherited DNA sequences and variations in the epigenetic and consequent biochemical milieu of germ and somatic cells with environmental and stochastic factors during the intrauterine to postnatal life, childhood, and adult life of an individual. Such explanations in unrelated individuals due to low DNA sequence variation and experimental results from closely related mammalian models are inadequate to account for differences in complex chronic disease outcomes. Despite much information on both genetic and environmental disease risk factors, there are relatively few examples of reproducible gene-environment interactions (GEI). Currently, there is also a lack of computational and bioinformatics methods that can reduce large and diverse environmental, epigenetic, epidemiological, and “-omics” data sets into representations that can be interpreted in a biological context. This chapter presents an integrated bioinformatics, biostatistics, and molecular epidemiologic approach to studying the relative contributions of environmental, epigenetic, genetic, and stochastic factors in transdisciplinary molecular epidemiological studies to determine the causality and progression of complex chronic disease phenotypes. 5.1 IntroductionComplex chronic diseases with a multifactorial etiology, such as Alzheimer’s, autism, diabetes, cancer, hypertension, Parkinson’s, and several neurodevelopmental and mental health disorders, have become the dominant public health burden. The interplay of multiple genetic and epigenetic variations and environmental and stochastic factors influencing biological pathways and networks contributes to the susceptibility to and development of complex diseases, as

well as differences in treatment responses [1]. As depicted in Fig. 5.1, it is the combined contributions or cancellations of the effects of a multitude of genetic, epigenetic, and environmental stressors and factors that lead to the development and progression of complex diseases over a period of time. ∑ ENVIRONMENTAL & STOCHASTIC FACTORS STRESSORS

Figure 5.1 Development and progression of chronic and complex diseases is the result of combined contributions of a multitude of genetic, epigenetic, and environmental stressors and factors over a period of time. Therefore, it is critical to design an integrated approach to include biostatistics, bioinformatics, and molecular, genetic, and epidemiologic aspects to unravel the link between the environment and health in order to understand the susceptibility and resistance to the development of complex chronic diseases. An integrated approach will also help describe the individual variations in responses to therapeutic interventions. It is realized now that a complex chronic disease outcome in an individual is the result of the collaboration of genes and environmental factors causing missing hormones and altering epigenomes of rogue cells. The interacting environment and genes that influence the origins of complex chronic diseases are also not necessarily the same as those that contribute to the chronic disease progression or, for example, metastasis of cancer. Susceptibility gene variants for each specific disease are being identified, with emerging evidence of gene-environment interactions (GEI). Although most chronic diseases are the result of complex interactions between genes (G) and environmental (E) factors, the majority of the analytical approaches adopted for genetic linkage and association studies do not incorporate interactive effects with environmental factors. Studies have indicated that failure to account for GEI in complex chronic disease association analyses can decrease the power to find genetic disease loci, and underestimate effects of both genetic and environmental contributions in the origin and progression of complex diseases. The failure to replicate

many genetic association studies is believed to be, in part, due to the omission of functional aspects of GEI in the study designs. The potential for detecting gene function(s) can be enhanced by taking environmental agents into account that are already known to play a key role in the disease etiology, particularly if these interactions are already known (e.g., tobacco smoking is a known environmental risk factor for lung or colorectal cancer). The success of genetic studies, in general, in identifying genetic variants for complex diseases will, therefore, be dependent on the further development of methods and analytical tools that can incorporate these complex functional interactions. Additionally, to detect, characterize, and interpret GEI that may cause complex diseases, innovative strategies and new tools are necessary to test multiple genes and multiple environmental risk factors, along with standard linkage analyses tools and candidate gene approaches. Limited functional tools are available to identify genetic variability and the genetic factors in an individual person, but it is difficult to define the environmental and stochastic agents that an individual is exposed to during his or her life span. Many genetic approaches are being utilized to understand disease susceptibility. These advancements are necessitating a shift in our scientific strategies for studying risk factors for chronic and complex diseases. The complex chronic diseases are not just one disease; there are hundreds or even more. For example, cancer is not caused by one agent or one environmental factor. To develop an integrative approach to measuring the contributions of gene and environmental stressors on the chronic disease outcome, it is important to consider that during the development of an individual from a single cell to prenatal stages to adolescent to adulthood and through the complete life span, the individual is exposed to countless environmental stressors. As we know, the environment constitutes everything that surrounds us both internally and externally, including toxicants, hormones, diet, psychosocial behaviors, and lifestyles. Like genes, these factors also interact among themselves. A single exposure to an internal or external environmental factor alone cannot explain the development of a complex chronic disease; rather it appears that exposure to multiple environmental and stochastic factors across the life span and their interactions influence the development of a chronic disease in an individual. The temporal and spatial environmental modulations of the normal genetic and phenotypic changes in a cell

lead to the development of a particular type of disease phenotype. In the realm of GEI, which affect disease phenotypes, toxicants such as tobacco smoke and alcohol, redox state, hormones, and diet are the best-studied environmental factors. Traditionally, it is believed that an orderly progression of changes in disease stem cells through a recognizable series of intermediate states leads toward a predictable end point, the disease, in equilibrium with the prevailing environment [1]. In contrast, a more recent view is based on adaptations of independent disease stem cells. Transitions between a series of different states of disease stem cells are disorderly and unpredictable, resulting from probabilistic processes such as invasion, death, differentiation, and survival of disease stem cells that make up rogue tissue or a malignant lesion. This reflects the inherent variability observed in behaviors of different types of cells present in the affected tissue in their time and space and the uncertainty of environmental and stochastic factors. In particular, it allows for a succession of alternative pathways and end points dependent on the chance outcome of gene-gene interactions (G × G), environment-environment interactions, and GEI and interactions between cells and their environment. An understanding of the interactions between multiple genetic, epigenetic, environmental, and stochastic factors will more accurately predict the probability of disease risk and variations in therapeutic response [2]. The integrated GEI mechanisms will also help to better explain the development and progression of a complex disease than any correlations with a single genetic, epigenetic, stochastic, or environmental factor. The integration of genomics, proteomics, transcriptomics, and metabolomics to identify important perturbations of normal biological pathways, networks, and systems influenced by environmental factors is necessary to understand the mechanistic role of the environment in complex chronic diseases, including cancers of different origin [2, 3]. Genetic, epigenetic, epidemiologic, biostatistics, and bioinformatics approaches need to be integrated to develop study designs and analytical strategies for identifying G × G and GEI in molecular epidemiologic and genomic studies with applicability of such understanding in complex chronic diseases. This chapter presents an integrated bioinformatics, biostatistics, and molecular epidemiologic approach to studying the relative contributions of environmental, epigenetic, genetic, and

stochastic factors in transdisciplinary molecular epidemiological studies to determine the causality and progression of complex chronic disease phenotypes. 5.2 Tools for Identifying Genetic,

Environmental, and Stochastic Factors Relevant in Complex Human Diseases The availability of the human genome sequence and advancement in the technologies for genomic analysis have allowed the expansion of genetic variance investigation to probe the susceptibility or resistance to complex diseases [4]. Variations in genes can occur at several genomic levels, including in single nucleotides, small stretches of DNA (microsatellites), whole genes, regulatory elements, noncoding areas, and structural components of chromosomes or complete

chromosomes, which then influence G × G and GEI in response to various environmental exposures. The basis of genetic evaluation is the identification of the allelic variants of human genes. The process of identifying DNA variation that may be associated with a complex disease is continuously being catalogued and mapped. Measurement of the frequency of these DNA variants in different susceptible or resistant populations determines the magnitude of the associated risk due to their interactions with other genes and the environment in the initiation and progression of complex chronic diseases. Currently available “-omics” (proteomics, transcriptomics, interactomics, etc.) tools may aid in the identification of alterations that result in loss or gain of functions that contribute to the disease outcome. For example, possibly complex diseases such as cancer might be more susceptible to the lower levels, or “softer” forms, of variation, such as variation in noncoding sequences and copy number, which alter gene doses without abolishing gene function. Recently it has been shown that DNA variants associated with disease traits are concentrated in noncoding regulatory regions of the human genome that are marked by hypersensitivity to the enzyme deoxyribonuclease I (DNase I) [5]. When appropriately integrated with other molecular, cellular, physiological, and environmental data, such information may improve the way we understand normal conditions and diagnose and treat complex chronic diseases.