ABSTRACT

Requirements .......................................................................................................... 130 7.2.3 General Scientific Data Infrastructure Requirements ...................................... 131

7.3 Architectural Epigrammatic for Analysis of Big Data ................................................. 131 7.4 Scientific Data Infrastructure Architectural Model ...................................................... 136 7.5 Performance Parametric Considerations for Big Data Architecture .......................... 137

7.5.1 Sizing of Clustering ............................................................................................... 137 7.5.2 MapReduce Algorithm .......................................................................................... 138 7.5.3 Input Data Set ......................................................................................................... 138 7.5.4 Data Node ............................................................................................................... 138 7.5.5 Data Locality .......................................................................................................... 138 7.5.6 Concurrent Activity ............................................................................................... 138 7.5.7 Network Considerations ....................................................................................... 139

7.6 Capacity and Scalability Consideration for Big Data Architecture ............................ 139 7.6.1 CPU .......................................................................................................................... 139 7.6.2 Memory ................................................................................................................... 139 7.6.3 Disk .......................................................................................................................... 140 7.6.4 Network ................................................................................................................... 140

7.7 Conclusion .......................................................................................................................... 140 References ..................................................................................................................................... 140 Authors ......................................................................................................................................... 141

In today’s world the use of enormous data, produced by several emerging applications, such as social networking, microblogging sites, and other data-centric sites, has created the concept of Big Data. Characterization of Big Data is done on the basis of continuously evolving volumes of different data types (i.e., unstructured, semistructured, and structured) from high-speed-generating sources capable of generating data at a very high rate (e.g., web logs of websites). This heavy amount of data produces new opportunities for data-related analysis such as product ranking, review analysis, and fraud detection.