ABSTRACT

It is commonplace to assume that the arts and humanities do not generate as much data as the sciences, and that “big data” is not really a meaningful aspect of humanistic research. This is understandable, given that data is understood to be registered by mechanical sensors and other devices that function in the physical world, from electron microscopes to space-based telescopes such as Hubble; as Frické (2015) observes, the sciences have become “data-intensive” (653), e.g. astronomy doesn’t just observe “celestial bodies,” since these bodies “can be probed across a wide range of the electromagnetic spectrum” (ibid.). In other words, there is a multiplier effect at work given our ability to use mechanisms that extend and supplement human visual capacities; such a multiplier effect is also seen in data flows, what Frické calls “sensor networks” that are always at work, generating “continuous data about geographical locations, the ocean, the weather, or atmospheric conditions” (ibid.). Frické adds one more source that leads to the data multiplier effect: human-created data, i.e. “created by our behaviour in conjunction with computers, smart phones, electronic or digital transactions, location services, and so on” (ibid.). “Big” data, then, is more

than simply large amounts of data, since with the above examples, it can be seen that they all refer to “unstructured data that need more real-time analysis” (Chen, Mao, and Liu 2014, 171), that is to say, these massive datasets themselves pose problems in terms of how they are understood or utilized. It could be argued that large amounts of data have always been gathered or accumulated by observers, either as individuals or collectively, and that this data was transformed through the activities of industrialization and mechanization, but of course big data comes of age in the era of digital computing – particularly in terms of data generation and analysis. Barnes (2013) gives a prosaic example from Batty (2013):  transportation data collected in London through the Oyster card-swipe system which registers “7  million individual daily journeys taken on London’s public transportation system” which adds up to “a data set of 15 billion over a 5-year period” (quoted in Barnes 2013, 298). The Royal Statistical Society, in the news section of its magazine Significance (August 2012), contextualizes big data via the research process undertaken to find the Higgs Boson using the Large Hadron Collider, which produces “some 600 million particle collisions per second in its detectors” (2). An individual collision event (i.e. the additional particles generated by each collision) is equal to a megabyte of data:

That means … [the Large Hadron Collider] was producing some 1015 bytes of information every second – which, for those not intimate with the higher powers of ten, is a petabyte [a million gigabytes]. A standard DVD can store about 5 gigabytes … so the collider [sic] has been filling the equivalent of 200 000 DVDs a second. It has taken the Collider about three years to pin down the Higgs Boson. The reader may calculate for himself or herself the truly massive size of the database it has generated in that time. … One massive dataset has just transformed our understanding of the universe. But that is just one database among others of similar, or even bigger, size.