Resource Management for MapReduce Jobs Performing Big Data Analytics

doi:10.1201/9781315154008-6

ABSTRACT

For various types of enterprise and scientific applications as well as cyber-physical systems (such as sensor-equipped bridges, smart buildings, and industrial machinery), processing and analyzing data is important for gaining insights and making meaningful decisions. The amount of data analyzed, however, is sometimes very large, and conventional processing tools and techniques cannot be used for analyzing such Big Data. A programming model, called MapReduce, is proposed by Google to simplify performing massively distributed parallel processing so that very large and complex datasets can be processed and analyzed efficiently. A popular implementation of the MapReduce 106programming model, called Hadoop, is used by many companies and institutions, typically in conjunction with cloud computing, for executing various Big Data applications, including web analytics applications, scientific applications, data mining applications, and enterprise data-processing applications. The focus of this chapter is on describing effective resource management algorithms and techniques for processing MapReduce jobs, including MapReduce jobs with an associated completion deadline. Effective resource management strategies are crucial for processing the MapReduce jobs submitted to the system and to achieve user satisfaction and high system performance that includes a high quality of service as reflected in a low ratio of jobs with missed deadlines, low job response times, high job throughput, and high resource utilization.