ABSTRACT

Linear regression involves a numerical outcome variable and explanatory variables that are either numerical or categorical. This chapter discusses important statistical concepts like the correlation coefficient, that “correlation isn’t necessarily causation,” and what it means for a line to be “best-fitting.” A regression line is “best-fitting” in that it minimizes some mathematical criteria. The correlation coefficient can be computed using the get_correlation() function in the moderndive package. A regression line is “best-fitting” in that it minimizes some mathematical criteria. Some people prefer comparing the distributions of a numerical variable between different levels of a categorical variable using a boxplot instead of a faceted histogram.