ABSTRACT

Statistical methods quantify patterns and their strength. They are essential tools for interpreting data. This chapter illustrates a data science workflow that uses a cycle of wrangling, exploring, visualizing, and modeling. It elucidates some of the connections between the sample—the data have got—and the population. The chapter demonstrates and justifies the statistical methods in a setting where one knows the "correct" answer. The discipline for making efficient use of data that is a core of statistical methodology leads to deeper thinking about how to make use of data—that thinking applies to large data sets as well. An important question that statistical methods allow to address what size of sample n is needed to get a result with an acceptable reliability. Ultimately we need to figure out the reliability of a sample statistic from the sample itself. The bootstrap is a statistical method that allows approximating the sampling distribution even without access to the population.