ABSTRACT

This chapter explores notions of big data and point the reader towards technologies that scale for truly big data. The terms data science and big data are often used interchangeably, but this is not correct. Technically, "big data" is a part of data science: the part that deals with data that are so large that they cannot be handled by an ordinary computer. In big data, the size of tables may be too large to fit on an ordinary computer, the data and queries on it may be coming in too quickly to process, or the data may be distributed across many different systems. The evolution of baseball data illustrates how "big data problems" have arisen as the volume and variety of the data has increased over time.