ABSTRACT

Youchan Zhu & Jing Li North China Electric Power University, Baoding, China

ABSTRACT: Nowadays the enterprise data is stored in the network environment on data warehouse or data center of different regions, you need a data mining technique to deal with distributed data storage and distributed tasks. By analyzing the traditional mining algorithm k-Means, we apply it to cloud computing environment and parallelize it by MapReduce, making data mining techniques in dealing with massive data to reduce the cost and improve the efficiency. And we verify the effectiveness of the parallelized k-Means algorithm with an experiment.