ABSTRACT

CONTENTS 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.1.1 Computational Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2 Reading Tables of Race Results into R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.3 Data Cleaning and Reformatting Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.4 Exploring the Run Time for All Male Runners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

2.4.1 Making Plots with Many Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.4.2 Fitting Models to Average Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.4.3 Cross-Sectional Data and Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

2.5 Constructing a Record for an Individual Runner across Years . . . . . . . . . . . . . . . . 79 2.6 Modeling the Change in Running Time for Individuals . . . . . . . . . . . . . . . . . . . . . . . 88 2.7 Scraping Race Results from the Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

2.1 Introduction In this era of ‘free and ubiquitous data,’ there is tremendous potential in seeking out data to bring insight to a problem we are working on professionally or to a topic of personal interest. For example, we are interested in understanding how people’s physical performance changes as they age. One source of data about this comes from road races. Hundreds of thousands of people participate in road races each year; the race organizers collect information about the runners’ times and often publish individual-level data on the Web. These freely accessible data may provide us with insights to our question about performance and age.