ABSTRACT

This chapter describes a family of methods that are well suited for the estimation of the probability. It considers only two outcomes (1 and 0), where 1 indicates that a particular medical condition of interest was found to be true, and 0 indicates that it was found not to be true. The chapter reviews the penalized log likelihood estimate described in O'Sullivan, Yandell, and Raynor and uses that as a simple vehicle to describe the bias-variance trade-off. It discusses the subtle issues in choosing the smoothing parameters and the key quantity df-signal. The chapter applies several PSA models to the estimation of the risk of diabetes mellitus, from the Pima-Indian data set in the UCI Repository of Machine Learning Databases. It compares the best of the PSA models to the use of the ADAP NN classification algorithm as applied by A. F. Smith et al. to the same Pima-Indian data set.