ABSTRACT

This book introduces the reader to data science using R and the tidyverse. No prerequisite knowledge is needed in college-level programming or mathematics (e.g., calculus or statistics). The book is self-contained so readers can immediately begin building data science workflows without needing to reference extensive amounts of external resources for onboarding. The contents are targeted for undergraduate students but are equally applicable to students at the graduate level and beyond. The book develops concepts using many real-world examples to motivate the reader.

Upon completion of the text, the reader will be able to:

  • Gain proficiency in R programming
  • Load and manipulate data frames, and "tidy" them using tidyverse tools
  • Conduct statistical analyses and draw meaningful inferences from them
  • Perform modeling from numerical and textual data
  • Generate data visualizations (numerical and spatial) using ggplot2 and understand what is being represented

An accompanying R package "edsdata" contains synthetic and real datasets used by the textbook and is meant to be used for further practice. An exercise set is made available and designed for compatibility with automated grading tools for instructor use.

part |182 pages

Programming Fundamentals

chapter 21|42 pages

Data Types

chapter 2|60 pages

Data Transformation

chapter 3|78 pages

Data Visualization

part |196 pages

Doing Statistics

chapter 1844|38 pages

Building Simulations

chapter 5|44 pages

Sampling

chapter 6|42 pages

Hypothesis Testing

chapter 7|36 pages

Quantifying Uncertainty

chapter 8|34 pages

Towards Normality

part |94 pages

Modeling

chapter 3809|60 pages

Regression

chapter 10|32 pages

Text Analysis