ABSTRACT

Lately, IP traffic volumes have been increasing rapidly [14]. Recent scientific works in various areas generate vast volumes of data each day. The Large Hadron Collider (LHC) generated 13 PB in 2010 (36 TB each day) [8] to study particle physics. The Sloan Sky Digital Survey (SSDS) aims to map objects in the universe to study. It generated 5 TB of data every year (14 GB each day) [25] for 5 years. That data is made public and available in the SDSS website.∗ The Square Kilometre Array (SKA) telescope project will study astronomical physics using its multiple antennas. SKA expects to gather 1 EB of data per day for the next few decades. These scientific endeavors are collaborative studies among international researchers and require the transfer of data between their research sites. CSIRO ASKAP, who run 36 SKA antennas, transfer 2.5 Gb/s continuously at (105 TB per day) to their Pawsey Supercomputing Centre.†

The data gathered from these studies can be processed locally to reduce its size, but the processed data still needs to be transferred. The size of the processed data differs as these projects use different techniques that can be applied to each specific research, but its order of magnitude is still large enough to be classified as big data. For example, SSDS provides public access to their data set even though it is larger than 100 TB.