ABSTRACT

To illustrate the value of feature engineering for enhancing model performance, consider the application of trying to better predict patient risk for ischemic stroke. The degree arterial stenosis has been used to identify patients who are at risk for stroke. One of the first steps of the modeling process is to understand important predictor characteristics such as their individual distributions, the degree of missingness within each predictor, potentially unusual values within predictors, relationships between predictors, and the relationship between each predictor and the response and so on. The chapter explores potential predictive relationships between individual predictors and the response and between pairs of predictors and the response. Pairwise interactions between predictors are prime candidates for exploration and may contain valuable predictive relationships with the response. Physicians have a strong preference towards logistic regression due to its inherent interpretability.