ABSTRACT

CONTENTS 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

5.1.1 Computational Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 5.2 Acquiring the Airline Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 5.3 Computing with Massive Data: Getting Flight Delay Counts . . . . . . . . . . . . . . . . . 219

5.3.1 The R Programming Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 5.3.2 The UNIX Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 5.3.3 An SQL Database with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 5.3.4 The bigmemory Package with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

5.4 Explorations Using Parallel Computing: The Distribution of Flight Delays . . 229 5.4.1 Writing a Parallelizable Loop with foreach . . . . . . . . . . . . . . . . . . . . . . . . . 230 5.4.2 Using the Split-Apply-Combine Approach for Better Performance . . . 231 5.4.3 Using Split-Apply-Combine to Find the Best Time to Fly . . . . . . . . . . . 232

5.5 From Exploration to Model: Do Older Planes Suffer Greater Delays? . . . . . . . . 236 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238

5.1 Introduction Anyone who has dealt with flight delays at the airport understands the associated inconvenience and aggravation. And while we might hope that delays are rare, they are probably more common than you think. Since October 1987, there have been over 50 million flights in the United States that failed to depart at their scheduled times. Around 200,000 of those flights were at least two hours late; some were much later. From these two simple facts we can surmise that delays are not isolated, rare events; they are routine. Since 1987 the number of flights per year has steadily increased and as this trend continues we expect to see more inconvenience, more aggravation, and more time lost.