ABSTRACT

Analyzing Baseball Data with R Second Edition introduces R to sabermetricians, baseball enthusiasts, and students interested in exploring the richness of baseball data. It equips you with the necessary skills and software tools to perform all the analysis steps, from importing the data to transforming them into an appropriate format to visualizing the data via graphs to performing a statistical analysis.

The authors first present an overview of publicly available baseball datasets and a gentle introduction to the type of data structures and exploratory and data management capabilities of R. They also cover the ggplot2 graphics functions and employ a tidyverse-friendly workflow throughout. Much of the book illustrates the use of R through popular sabermetrics topics, including the Pythagorean formula, runs expectancy, catcher framing, career trajectories, simulation of games and seasons, patterns of streaky behavior of players, and launch angles and exit velocities. All the datasets and R code used in the text are available online.

New to the second edition are a systematic adoption of the tidyverse and incorporation of Statcast player tracking data (made available by Baseball Savant). All code from the first edition has been revised according to the principles of the tidyverse. Tidyverse packages, including dplyr, ggplot2, tidyr, purrr, and broom are emphasized throughout the book. Two entirely new chapters are made possible by the availability of Statcast data: one explores the notion of catcher framing ability, and the other uses launch angle and exit velocity to estimate the probability of a home run. Through the book’s various examples, you will learn about modern sabermetrics and how to conduct your own baseball analyses.

Max Marchi is a Baseball Analytics Analyst for the Cleveland Indians. He was a regular contributor to The Hardball Times and Baseball Prospectus websites and previously consulted for other MLB clubs.

Jim Albert is a Distinguished University Professor of statistics at Bowling Green State University. He has authored or coauthored several books including Curve Ball and Visualizing Baseball and was the editor of the Journal of Quantitative Analysis of Sports.

Ben Baumer is an assistant professor of statistical & data sciences at Smith College. Previously a statistical analyst for the New York Mets, he is a co-author of The Sabermetric Revolution and Modern Data Science with R.

 

chapter |31 pages

The Baseball Datasets

chapter |34 pages

Introduction to R

chapter |26 pages

Graphics

chapter |18 pages

The Relation Between Runs and Wins

chapter |26 pages

Value of Plays Using Run Expectancy

chapter |26 pages

Balls and Strikes Effects

chapter |15 pages

Catcher Framing

chapter |22 pages

Career Trajectories

chapter |28 pages

Simulation

chapter |22 pages

Exploring Streaky Performances

chapter |18 pages

Using a Database to Compute Park Factors

chapter |23 pages

Batted Ball Data from Statcast