ABSTRACT

Since Galton’s original development, regression has become one of the most widely used tools in data science. One reason has to do with the fact that regression permits us to find relationships between two variables taking into account the effects of other variables that affect both. This has been particularly popular in fields where randomized experiments are hard to run, such as economics and epidemiology. Fast food consumers are more likely to be smokers, drinkers, and have lower incomes. Therefore, a naive regression model may lead to an overestimate of the negative health effect of fast food. In this chapter we learn how linear models can help with such situations and can be used to describe how one or more variables affect an outcome variable. The chapter also focuses on scoring runs and ignore the two other important aspects of the game: pitching and fielding.