ABSTRACT

The goal of Chapter 15 is to provide analytic formulas for characterizing the sampling error of specific estimators for a large class of unimodal and multimodal smooth empirical risk functions. Under general conditions, a function of an empirical risk function minimizer is shown to converge in distribution to a multivariate Gaussian random vector. This multivariate Gaussian random vector has a mean and covariance matrix which can be estimated using the first and second derivatives of the loss function used to construct the empirical risk function. The resulting covariance matrix estimator is a robust sandwich covariance matrix estimator. Examples of the application of the theory to: linear regression, nonlinear regression, multilayer perceptrons, and Gaussian mixture models are provided. In addition, the asymptotic statistical theory is used to construct confidence regions for the parameter estimates and derive Wald hypothesis tests. Bayesian hypothesis testing is also briefly discussed. A unique contribution of this chapter is a careful presentation of the assumptions required for the mathematical theory in this chapter to hold and a discussion of how those assumptions are relevant for commonly encountered machine learning problems.