ABSTRACT

Theoretical and computational methodologies are indispensable approaches for the indirect inference of interactions, not just because they are more efficient and simpler to implement than experiments, but also because they often reveal the underlying principles of network connectivity and enable the prediction of new interactions in a systematic manner. Insights gained from successful computational models of networks can, in principle, be used to design new experiments that test these insights in a broad context. It is very difficult to formulate a successful computational model

for a biological network. In most cases, these models are based on assumptions that are too simple and therefore inaccurate. The relatively rare biological models that have few parameters and are based on general principles are called unsupervised models because their construction does not require detailed experimental data. Inaccuracies in these models are compensated by high-level insights into the processes being modeled. We will see examples of unsupervised models of large networks in Chapter 7. In many cases, however, it is important for models to be accurate in their details, and therefore to constantly confront them with results from experiments and tweak them to better capture these results. These biological models must have a more direct relationship with experiment than is prevalent in idealized models: they are constructed via a well-defined procedure that takes experimental data as input and outputs a set of model parameters. Such models are called supervised models because the numerical values of the parameters in them depend on the input data. The input data are called the training set for the supervised model. The training set changes as more data become available, leading to refinement in model parameters. The model parameters can in turn be used to predict new networks. Supervised approaches that use properties of known network inter-

actions to infer new network interactions fall under the category of machine learning approaches for network inference. While reasonably

successful in terms of reproducing known interactions and predicting new ones, these models contain large numbers of parameters and are often too complex to grasp intuitively. The reason is a well-known theme in biology: simple models based on clear, general principles are often inaccurate because they do not capture the complexity of the details, while detailed models with large numbers of parameters fail to reveal general unifying principles. The challenge of finding simple yet accurate mathematical/computational models for large, complex biological systems remains largely insurmountable. In this chapter, we begin our foray into computational methods for

biological network inference. We start with a primer on entropy and information, because a basic understanding of these concepts is essential to understanding the computational inference of regulatory interactions. The primer is followed by a discussion of regulatory network inference, followed by a description of methodologies to infer proteinprotein interactions.