ABSTRACT

Hadoop presents MapReduce as an analytics engine, and under the hood, it uses a distributed storage layer referred to as the Hadoop distributed file system (HDFS). As an open-source implementation of MapReduce, Hadoop is, so far, one of the most successful realizations of large-scale data-intensive cloud computing platforms. It has been realized that when and where to start the reduce tasks are the key problems to enhance MapReduce performance.