ABSTRACT

To prepare for modeling, a dataset has to be constructed which contains predictor variables and event time outcomes from previous patients. This chapter discusses the many data preparation issues associated with the overall goal, which is to develop a risk prediction model for implementation in clinical, public health, or other real-life settings. When the aim is to make a risk prediction model, using age as the time scale is not advised because the only age zero that all subjects have in common is birth. Our philosophy is that the subject matter expert should initially select the predictor variables to be included in the model. A predictor variable can be derived from one or several of the other variables in the database. For all continuous variables, one should record the lowest and highest possible values and, if available, normal population reference values.