ABSTRACT

Datasets in science, engineering, medicine, social sciences, and derived from the Internet have three significant properties:

• They are extremely large, holding millions or even billions of records, partly because collection can often be automated.