Data Science Methods and Machine Learning | 7

ABSTRACT

In this chapter, we will gain knowledge of major areas of machine learning (ML), which are regression, prediction, classification, and clustering. Regression analysis examines the relationship between a target (dependent variable(s)) and a predictor (independent variable(s)). It also helps us to examine the amount of impact the predictors make on the target variable. There are various regression techniques based on the number of predictors, the shape of the line of regression, and the type of target variable. The subtopics mentioned in the chapter will give us a piece of brief information about different types of regression analysis. Time-series analysis is discussed in the chapter; this is a technique where various methods and models are taken up with the time-series data to extract meaningful insights and data characteristics; further, three time-series models are described. Furthermore, we have covered the major ML algorithms such as support vector machine (SVM), naïve Bayes, and k-nearest neighbor (k-NN). The idea behind the SVM algorithm is to place all the data points of the dataset in multidimensional data space (of the size of the number of features in the dataset) and then explore the best hyperplane that differentiates the classes efficiently. The k-NN algorithm is a simple ML algorithm that is used for regression as well as classification problems. The prediction for an unknown data instance is found by searching for the most similar k instances in the training dataset and returning the most similar data instance. In concluding the second section that is ML, we got brief information on the confusion matrix, which is widely used to calculate the classifier’s performance. On studying this chapter, we will learn to analyze large and complex datasets, create systems that adapt and improve over time, and build intelligent applications to make predictions from data.