ABSTRACT

Map/Reduce is the world’s most popular programming model for massive data processing. Its main Map (mapping) and Reduce (simplified) concepts is borrowed from functional programming languages such as Lisp ideas. The principle of Map/Reduce programming model is that a big user task is divided into several more fine-grained sub-tasks and then scheduling these sub-tasks to free node cluster. In this way, the faster processing node can handle more sub-tasks, thereby reducing the overall tasks completion time. The developers only need to complete the logical achievement of these two functions (Map and Reduce) and then submit it to Map/Reduce execution environment, the child will be automatically scheduling tasks on the cluster, parallel execution. Data communication between the different computing nodes Map/Reduce environmentally responsible separated input data, scheduling tasks, during the execution of the processing task node failure, as well as coordination. Map/Reduce scheduling model shown in Figure 1.