ABSTRACT

As opposed to simple data description, statistical inference involves drawing conclusions about a data-generating process based on the implications of probability models for observed events. Parameters are certain key descriptors or indices of these models, and are almost always unobservable. Inferential statistics then is basically the process of drawing conclusions about these parameters based on the observed data. In carcinogenicity analyses the key parameter is almost always related to the

probability of developing a tumor as a function of increasing drug dose. Sometimes it will also involve other covariates like gender or baseline weight, or, for combination drugs, mixtures of drug doses. A critical concept in all probability models is that of the probability density

(mathematically one would describe this as the Radon-Nikodym derivative with respect to some dominating measure) of the random variable that represents the potentially observed values. The sum of the values of the density, or more generally its integral, over the values within a specified set of values defines the probability that the random variable takes a value within the set. In a typical model, the distribution is indexed by some parameter or parameters which, in this chapter, are generally denoted by u. The umay denote a single value or a vector of parameter values. The differences should be clear from the context. When this probability function or density is considered as a function of the parameters it is called a likelihood. Thus for a density f(yju) and a specific value y of Y, the function L(ujy)¼ f(yju) is also a likelihood function of u. A commonly expressed caveat is it that, given the observed value of Y, the likelihood provides a measure of how ‘‘relatively likely’’ is any particular value of u. The so-called likelihood principle in statistics is that all of the information from

the sample about the parameters is contained in the likelihood function. Thus for those who agree with this principle, all analyses should be based on the likelihood. Birnbaum showed that this principle is a consequence of sufficiency and a simple conditionality argument. However, not all statisticians agree with this principle (for an extensive discussion with comments, see Berger and Wolpert [1]). Suppose we write Pr(A) as the probability of an event A, and Pr(AjB) as the

probability of event A given that event B occurred. The latter are called conditional probabilities. Bayes’ rule or Bayes theorem is just a way of relating conditional probabilities. SupposeH is some event of interest withHc its complement, i.e., anything other than H, so that the probability of Hc is 1 Pr(H). Suppose that D is some other, usually observed, event, then, in the simplest possible case of Bayes rule we have

Pr(HjD) ¼ Pr(DjH)Pr(H) Pr(DjH)Pr(H)þ Pr(DjHc)Pr(Hc) :

This simple observation is the basis of Bayesian analysis in statistics. Currently there seem to be three main approaches to statistical inference, which

might be labeled as frequentist, ‘‘likelihoodist,’’ or Bayesian. Actually, most statisticians move from one approach to another as convenient. Frequentist methods (as were emphasized in Chapter 21 [2]) take the parameters as fixed and model the distribution of responses. One then indirectly assesses the implications of the model by comparing the observed statistics to the distribution of responses. However, the a priori nature of this derivation implies that values that were not observed affect conclusions. This leads to the often noted observation that many frequentist techniques depend as much or more on what did not happen as they do on what did happen. Since many of these frequentist techniques depend upon values that did not occur in the sample, they violate the likelihood principle. Statistics that are basically frequentist in derivation but do follow the likelihood principle are sometimes called Likelihoodist.