ABSTRACT

The functioning of a living cell is governed by an intricate network of interactions among different types ofmolecules. A collection of longDNAmolecules called chromosomes, that together constitute the genome of the organism, encode for much of the cellular molecular apparatus including various types of RNAs and proteins. Short DNA sequences that are part of chromosomal DNA, called genes, can be transcribed repeatedly to result in various types of RNAs. Some of these RNAs act directly, such as micro (miRNA), ribosomal (rRNA), small nuclear (snRNA), and transfer (tRNA) RNAs. Many genes result in messenger RNAs (mRNAs), which are translated to corresponding proteins, a diverse and important set of molecules critical for cellular processes. A plethora of small molecules that are outside the hereditarily derived genes-RNAs-proteins system, called metabolites, play a crucial role in biological processes as intermediary molecules that are both products and inputs to biochemical enzymatic reactions. These complex interactions define, regulate, and even initiate and terminate biological processes,

and also create the molecules that take part in them. They are pervasive in all aspects of cell function, including the transmission of external signals to the interior of the cell, controlling processes that result in protein synthesis, modifying protein activities and their locations in the cell, and driving biochemical reactions. Gene products coordinate to execute cellular processes-sometimes by acting together, such as multiple proteins forming a protein supercomplex (e.g., the ribosome), or by acting in a concerted way to create biochemical pathways and networks (e.g., metabolic pathways that break down food, and photosynthetic pathways that convert sunlight to energy in plants). It is the same gene products that also regulate the expression of genes, often through binding to

cis-regulatory sequences upstream of genes, to calibrate gene expression for different processes and to even decide which pathways are appropriate to trigger based on external stimuli. The genomic revolution of the past two decades provides the parts list for systems biology.

Advances in high-throughput experimental techniques are enabling measurements of mRNA, protein, andmetabolite levels, and the detection ofmolecular interactions on amassive scale. In parallel, automated parsing and manual curation have extracted information on molecular interactions that have been deposited in the scientific literature over decades of small-scale experiments. In combination, these efforts have provided us with large-scale publicly-available datasets of molecular interactions and measurements of molecular activity, especially for well-studied model organisms such as Saccharomyces cerevisiae (baker’s yeast), Caenorhabditis elegans (a nematode), and Drosophila melanogaster (the fruitfly), pathogens such as Plasmodium falciparum (the microbe that causes malaria), and for Homo sapiens itself. These advances are transformingmolecular biology from a reductionist, hypothesis-driven exper-

imental field into an increasingly data-driven science, focused on understanding the functioning of the living cell at a systems level. How do the molecules within the cell interact with each other over time and in response to external conditions?What higher-level modules do these interactions form? How have these modules evolved and how do they confer robustness to the cell? How does disease result from the disruption of normal cellular activities? Understanding the complex interactions between these diverse and large body of molecules at various levels, and inferring the complex pathways and intermediaries that govern each biological process, are some of the grand challenges that constitute the field of systems biology. The data deluge has resulted in an ever-increasing importance placed on the computational analysis of biological data and computationally-driven experimental design. Research in this area of computational systems biology (CSB) spans a continuum of approaches [IL03] that includes simulating systems of differential equations, Boolean networks, Bayesian analysis, and statistical data mining. Computational systems biology is a young discipline in which the important directions are still in

a state of flux and being defined. In this chapter, we focus primarily on introducing and formulating the most well-studied classes of algorithmic problems that arise in the phenomenological and datadriven analyses of large-scale information on the behavior of molecules in the cell. We focus on research where the problem formulations and algorithms developed have actually been applied to biological data sets. Where possible, we refer to theoretical results and tie the work in the CSB literature to research in the algorithms community. The breadth of topics in CSB and the diversity of the connections between CSB and theoretical computer science preclude an exhaustive coverage of topics and literature within the scope of this short chapter. We caution the reader that our treatment of the topics and their depth and citation to relevant literature are by no means exhaustive. Rather, we attempt to provide a self-contained and logically interconnected survey of some of the important problem areas within this discipline, and provide pointers to a reasonable body of literature for further exploration by the reader. By necessity, this chapter introduces a number of biological terms and concepts that a computer scientist may not be familiar with. A glossary at the end of the chapter provides an easy resource for cross-reference.