ABSTRACT

This chapter explores approaches for working with data sets that are larger—let's call them medium data. These data will fit on a personal computer's hard disk, but not necessarily in its memory. Database management systems implementing structured query language (SQL) provide a ubiquitous architecture for storing and querying data that is relational in nature. SQL is a programming language for relational database management systems. Relational database management systems are very efficient for data that is naturally broken into a series of tables that are linked together by keys. The theoretical foundation for SQL is based on relational algebra and tuple relational calculus. These ideas were developed by mathematicians and computer scientists, and while they are not required knowledge for our purposes, they help to solidify SQL's standing as a data storage and retrieval system. SQL has been an American National Standards Institute standard since 1986, but that standard is only loosely followed by its implementing developers.