ABSTRACT

This chapter aims to give insight into how Big Data predictive modeling and analytics can be used for effective planning of businesses. At the outset, we would like to note that this topic is a very niche upcoming area of research. There are numerous published works dealing with predictive modeling and analytics techniques, but they do not focus on the nuances and intricacies associated with Big Data and its four dimensions, namely, volume, variety, velocity, and veracity. Therefore, in this chapter, we strived to concentrate on the Big Data prediction aspects after giving a brief overview of the traditional predictive modeling and analytics techniques. We start the chapter by giving a glimpse to the user, with case study examples, on how precise predictions based on current and history data can help in effective business planning. After this brief introduction, we explain the predictive modeling process starting with the preprocessing step of selecting and preparing the data, followed by fitting a mathematical model to this prepared data and ending with estimating and validating the predictive model. Then in the next section we describe the various types of predictive models, starting with models for supervised learning, namely, linear and nonlinear regression, decision trees, random forests, and support vector machines. We end the section with cluster analysis, the only unsupervised learning predictive model covered in this chapter. Then we deal with measuring the accuracy of predictive models through target shuffling, lift charts, receiver operating characteristic (ROC) curves, and bootstrap sampling. Then we focus on the tools and techniques of predictive modeling and analytics. We cover the CRISP-DM technique here, which is used for data mining. We also describe implementation of predictive analytics using R, an open-source tool, by taking a sample case-study application. We wind up the chapter by giving insight into the research trends and upcoming initiatives by industry giants in this cutting-edge realm of research.