ABSTRACT

In some situations, we suspect some kind of relationship between two vari­ ables. A study is undertaken to examine the degree of relationship, which could profitably be used to predict one variable, using the information on the o ther variable. Studies which are undertaken to establish relationships may be called correlation studies or association studies. The term associa­ tion is usually used when the variables are a ttribu tes , th a t is, their values belong to one of several categories. After establishing the existence of a re­ lationship, we may want to model th is relationship, which can be used for prediction purposes. Studies undertaken for this purpose can be term ed re­ gression studies. F irst we consider the problem of testing for association or correlation, and then we discuss the linear regression problem . O ther com­ monly used nonparam etric regression procedures based on projection pur­ suit and the nearest neighbor are beyond the scope of th is book. In pro­ jection pursu it regression, the dependent variable is estim ated by the sum of general smooth functions of linear projections of several predictor vari­ ables iteratively; the interested reader is referred to Friedman and Stuetzle (1981). In the simplest case of nearest neighbor regression, the dependent variable is estim ated by the mean or median of the responses from k nearest neighbor points w ith respect to Euclidean distance function of the train ing data . For fu rther details see B hattacharya and Mack (1987), Devroye et al. (1994), and Yang (1998). We will only discuss the logistic and P H regression models.