ABSTRACT

This chapter introduces regression analysis, which subsumes the previous t-test and ANOVA analyses and is the gateway into supervised machine learning. As such, regression analysis may be the essential statistical technique. This chapter applies basic regression concepts to the simplest models, which have only one predictor variable. The lessR function Regression() provides a comprehensive default analysis, with all aspects of the output under user control. The analysis estimates a linear model's parameters (weights) that relates the predictor variable to the value to predict. It then applies inferential statistics to estimate the corresponding population values. Define the residuals as the difference between the value computed by the model and the actual value. These residuals are the key to estimating the weights and evaluating model fit. The chosen weights, intercept, and slope coefficients, minimize the sum of the squared residuals, the least-squares solution. When applied to new data, the value computed by the model is the predicted value, encompassed by a prediction interval. Outliers in the data are evaluated. Assumptions of the model are illustrated, and how to address the violation of linearity given curvilinear relationships.