ABSTRACT

To this point we have discussed various aspects of uncertainty arising in inverse problem techniques. All discussions have been in the context of a given set or sets of data carried out under various assumptions on how (e.g., independent sampling, absolute measurement error, relative measurement error) the data were collected. For many years [4, 7, 16, 17, 18, 21, 22, 23] now scientists (and especially engineers) have been actively involved in designing experimental protocols to best study engineering systems, including parameters describing mechanisms. Recently, with increased involvement of scientists working in collaborative efforts with biologists and quantitative life scientists, renewed interest in the design of “best” experiments to elucidate mechanisms has been seen [4]. Thus a major question that experimentalists and inverse problem investigators alike often face is how to best collect the data to enable one to efficiently and accurately estimate model parameters. This is the well-known and widely studied optimal design problem. We would be remiss in this monograph if we did not provide at least a brief introduction to the ideas and issues that arise in this methodology. Traditional optimal design methods (D-optimal, E-optimal, c-optimal) [7,

16, 17, 18] use information from the model to find the sampling distribution or mesh for the observation times (and/or locations in spatially distributed problems) that minimizes a design criterion, quite often a function of the Fisher Information Matrix (FIM). Experimental data taken on this optimal mesh are then expected to result in accurate parameter estimates. In many scientific fields where mathematical modeling is utilized, mathematical models grow increasingly complex over time, containing possibly more state variables and parameters, as the underlying governing processes of a system are better understood and refinements in mechanisms are considered. Additionally, as technology invents and improves devices to measure physical and biological phenomena, new data become available to inform mathematical modeling efforts. The world is approaching an era in which the vast amounts of information available to researchers may be overwhelming or even counterproductive to efforts. We outline a framework based on the FIM for a system of ordinary differential equations (ODEs) to determine when an experimenter should

of

physical or biological process modeled by a dynamical system. Inverse problem methodologies are discussed in the previous chapters in the

context of dynamical system or mathematical model when a sufficient number of observations of one or more states (variables) is available. The choice of method depends on assumptions the modeler makes on the form of the error between the model and the observations (the statistical model). The most prevalent source of error is observation error, which is made when collecting data. (One can also consider model error, which originates from the differences between the model and the underlying process that the model describes. But this is often quite difficult to quantify.) Measurement error is most readily discussed in the context of statistical models. The three techniques commonly addressed are maximum likelihood estimation (MLE), used when the probability distribution form of the error is known; ordinary least squares (OLS), for error with constant variance across observations; and generalized least squares (GLS), used when the variance of the data can be expressed as a non-constant function. Uncertainty quantification is also described for optimization problems of this type, namely, in the form of observation error covariances, standard errors, residual plots and sensitivity matrices. Techniques to approximate the variance of the error are also included in these discussions. In [11], the authors develop an experimental design theory using the FIM to identify optimal sampling times for experiments on physical processes (modeled by an ODE system) in which scalar or vector data is taken. The experimental design technique developed is applied in numerical simulations to the logistic curve, a simple ODE model describing glucose regulation and a harmonic oscillator example. In addition to when to take samples, the question of what variables to

measure is also very important in designing effective experiments, especially when the number of state variables is large. Use of such a methodology to optimize what to measure would further reduce testing costs by eliminating extra experiments to measure variables neglected in previous trials (see [9]). In [6], the best set of variables for an ODE system modeling the Calvin cycle [24] is identified using two methods. The first, an ad hoc statistical method, determines which variables directly influence an output of interest at any one particular time. Such a method does not utilize the information on the underlying time-varying processes given by the dynamical system model. The second method is based on optimal design ideas. Extension of this method is developed in [12, 13]. Specifically, in [12] the authors compare the SE-optimal design introduced in [10] and [11] with the well-known methods of D-optimal and E-optimal design on a 6-compartment HIV model [3] and a 31 dimensional model of the Calvin cycle [24]. Models for which there may be a wide range of variables to possibly observe are not only ideal on which to test the proposed methodology, but also are widely encountered in applications. For example, the methods have been recently used in [14, 15] to design optimal data collection in terms of the location of sensors and the number needed for

activity along the scalp. We turn to an outline of this methodology to make observations for best times and best variables.