Auxiliary variable selection in a statistical matching problem

doi:10.1201/9781315120416-5

ABSTRACT

This chapter argues that an appropriate selection of the matching variables can be found through the notion of uncertainty that is to select those variables that minimize the uncertainty region. Statistical matching aims at combining information available in distinct sample surveys referred to the same target population when the two samples are disjoint. The method proposed for selecting matching variables when dealing with categorical X, Y and Z variables relies on a simple idea: select the subset of the X variables that is more effective in reducing the uncertainty measured, for instance in terms of d. Selecting the matching variables by exploring uncertainty is computationally demanding but has the advantage of avoiding separate analyses. Classic methods for variable selection are based on the analysis of the explicative power of auxiliary variables X in terms of, for instance by analysing the residuals of the model.