ABSTRACT

Predicting traffic collisions can be a daunting task because of the involvement of human factors and a large number of variables. To ensure clarity in big data analytics, a standardized framework known as Cross-Industry Standard Process for Data Mining (CRISP-DM) is followed. The CRISP-DM framework consists of six clearly defined phases that encompass a data mining project. These phases include the following: business understanding, data understanding, data preparation, modelling, evaluation and deployment. This chapter investigates the appropriateness of decision tree and logistic regression models in successfully predicting severe and non-severe traffic accidents. From the analysis of data, decision trees and logistic regression models are determined to be useful in exploring the business question. Decision trees are one of the most widely used and practical methods for inductive inference. Logistic regression model is a commonly used model for measuring the relationship between categorical dependent and independent variables.