ABSTRACT

We learn from data, both experimental and observational data. Scientists propose hypotheses about the underlying mechanism of the subject under study. These hypotheses are then tested by comparing the logic consequences derived based on them and the observed data. An hypothesis is a model about the real world. The logical consequence is what the model predicts. Comparing model predictions and observations is to decide whether the proposed model is likely to produce the observed data. A positive result provides evidence supporting the proposed model, while a negative result is evidence against the model. This process is a typical scientific inference process. The proper handling of the uncertainty in data and in the model is often the difficulty in this process. The role of statistics in scientific research is to provide quantitative tools for bridging the gap between observed data and proposed models. The foundation of modern statistics was laid down partly by R.A. Fisher in

his 1922 paper “On the Mathematical Foundations of Theoretical Statistics” [Fisher, 1922]. In this paper, Fisher launched “the first large-scale attack on the problem of estimation” [Bennett, 1971], and introduced a number of influential new concepts, including the level of significance and the parametric model. These concepts and terms became part of the scientific lexicon routinely used in environmental and ecological literature. The philosophical contribution of the 1922 essay is Fisher’s conception of inference logic, the “logic of inductive inference.” At the center of this inference logic is the role of “models” – what is to be understood by a model, and how models are to be imbedded in the logic of inference. Fisher’s definition of the purpose of statistics is perhaps the best description of the role of a model in statistical inference:

In order to arrive at a distinct formulation of statistical problems, it is necessary to define the task which the statistician sets himself: briefly, and in its most concrete form, the object of statistical methods is the reduction of data. A quantity of data, which usually by its mere bulk is incapable of entering the mind, is to be replaced by relatively few quantities which shall adequately represent the whole, or which, in other words, shall contain as much as possible, ideally the whole, of the relevant information contained in the original data.