ABSTRACT

An increasingly important goal in medical research is to extract information from a large number of variables measured on patients in order to make predictions of disease-related outcomes. When the outcome of interest is a possibly censored time to event, such as the time to disease recurrence or death, statistical methods that account for censoring must be used. Most classical statistical methods that relate covariates to outcome assume that the number of covariates, p, is less than the number of observations, n; to work well, most methods require p to be significantly less than n. In current research, however, it is common for p to be large relative to n, a data structure usually described as high-dimensional data. Sometimes, as in the case of most genomic studies, the covariates vastly outnumber the sample size. This setting is often denoted by p n, and is occasionally referred to as ultra-high dimensional data. In these settings, most classical statistical methods are not applicable without modification. Here, we review methods that have been developed for relating high-dimensional data to survival outcomes, focusing on methods which can be

risk predictions for new observations; for methods related to testing in the high-dimensional setting, see Chapter 15.