ABSTRACT

Model selection is the task of selecting a statistical model from a set of potential models, given the data. It is always an important issue in statistical inference. The methods and theory for Gaussian process regression discussed in this book so far have been built on an assumption that the data come from the true model (1.10) and (1.11), implying that we know in advance which covariance function we should use and which of the covariates the response variable is dependent on. However, in practice, we would always ask questions before we apply any model to a dataset. What data are used? What kind of statistical analysis or model can we use? What model is the best choice if a variety of models can be used? This is the problem of model selection. With regard to Gaussian process regression, there are three major issues: the selection of the covariance function; the selection of covariates which should be included in the covariance kernel function, i.e., variable selection in Gaussian process regression; and the selection of the values of the hyper-parameters. We have discussed the third issue in the last chapter by using an empirical Bayesian approach and other methods, assuming that a particular covariance kernel with a certain set of covariates has already been selected. In this chapter, we focus on the other two issues, i.e., how to select a suitable covariance kernel and how to select covariates on which the response variable is dependent.