ABSTRACT

The statistical framework, definition of the value of a fixed regime, characterization of an optimal regime, and assumptions under which the value of a fixed regime and an optimal regime and its value can be estimated for the multiple decision setting are more complicated than in the single decision case. Thus, to set the stage for the precise, detailed account of these developments in subsequent chapters and to provide a less technical account for readers interested primarily in the “big picture,” an overview of the multiple decision problem is presented. A multiple decision regime is defined, and the statistical potential outcomes framework, assumptions (SUTVA, sequential randomization, positivity), and data that are required are described. A fundamental result, the g-computation algorithm, is presented, which shows that the value of a fixed regime, defined in terms of potential outcomes, is identifiable from observed data under the assumptions. Methods for estimation of the value of a fixed regime based on g-computation and inverse probability weighting are described. An optimal regime is characterized using the principle of backward induction, and estimation of an optimal regime using Q-learning and methods based on inverse probability weighting is presented.