Pseudo-Value Regression Models

doi:10.1201/b16248-18

ABSTRACT

Much of the survival analysis literature is focused on inference in the presence of missing data typically due to right censoring. Often these survival models are formulated in terms of hazard regression models. In this article we review an alternative approach to inference with incomplete survival data, based on pseudo-values or pseudo-observations obtained from a jackknife statistic constructed from non-parametric estimators for the quantity of interest. These pseudo-values are then used as outcome variables in a generalized linear model and model parameters are estimated using generalized estimating equations (GEE) (Liang and Zeger, 1986). This approach was ﬁrst proposed by Andersen et al. (2003) for direct modeling of state probabilities in a multi-state model. This simple approach can be applied to regression models for any mean value parameter. In particular, the general approach allows for

to be extended to a number of non-standard settings, including survival probabilities at a ﬁxed point in time (Klein et al., 2007), cumulative incidence (Klein and Andersen, 2005; Klein et al., 2008), restricted mean survival (Andersen et al., 2004), quality adjusted survival (Andrei and Murray, 2007; Tunes-da Silva and Klein, 2009), multi-state models (Andersen and Klein, 2007) , and clustered time to event data (Logan et al., 2011). These pseudo-values have also been used as outcome variables in a scatter plot or to compute pseudo-residuals for a regression model in order to facilitate goodness of ﬁt assessment (Perme and Andersen, 2008; Andersen and Perme, 2010). The main advantage of pseudovalue regression is that it provides a simple and generalizable method of modeling complex time to event data which is often not easily modeled using standard techniques. Furthermore, pseudo-value regression is easily implemented using existing software packages once the pseudo-values have been obtained.