ABSTRACT

In recent years, quick developments in high-throughput biotechnology have enabled researchers to generate thousands of potentially interesting measurements per subject. Especially in the field of survival analysis, these measurements are extremely valuable because knowledge of the human genome could greatly enhance our understanding of many diseases and could lead to more accurate survival prediction models. However, the advent of gene expression data and other types of high-dimensional genomic data did not only give rise to numerous new opportunities but also brought new computational and methodological challenges. It is no longer possible to use standard survival prediction methods, such as multivariate Cox regression, directly, when the number of covariates greatly exceeds the number of subjects. Identifying influential covariates becomes more complicated because thousands of hypotheses have to be tested simultaneously. To control the number of false discoveries (i.e., the number of covariates that are believed to be influential while in fact they are not), proper adjustments for the number of tests performed are needed, the socalled multiple testing corrections.