ABSTRACT

In many areas of applications of statistical principles and procedures one encounters observations that take one of two possible forms. Such binary data are often measured with covariates or explanatory variables that are either continuous or discrete or categorical. The relation between the response and the covariates is usually modeled by assuming that the probability of a "positive response", after a suitable transformation, is linear in the covariates. Let (y;, z J, i = 1, ... , N be the set of observations where z[ = (xli, ... , Xki) is the set of covariates and the binary response Yi is either 0 or 1. Binary regression models assume that the random variables Y1 , ... , Y N are independent and

P(Y; = 1) = H(z[ §), i = 1, ... , N. (1)

Here § = ((31 , ... , f3k f is a vector of unknown parameters and the function H is usually assumed to be known. In the terminology of Generalized Linear Models (McCullagh and Neider (1989)), H is the inverse link function. For ease of exposition, we refer to H as the link function in this article. Popular probit and logit models are obtained if H is chosen as the standard normal cumulative distribution function ( cdf) <f> or the cdf of the standard logistic distribution respectively. Such choices of H are often done for convenience and on an ad hoc basis.