ABSTRACT

With the development of Internet technology, Digital information is exponentially increasing. According to the report of Digital Universe issued by Internet Data Center, the amount of data will reach 40 ZB during the next eight years equivalent to 5200 G data by each produced[9]. How to efficiently calculate and store the vast amount of data is becoming a challenge that the Internet enterprise will face. Most of the traditional mass data processing use parallel computing, grid computing, distributed high performance computing, etc. It will consume expensive storage and computing resources. And the effective allocation and reasonable segmentation for large-scale data computing tasks need complex programming to realize[2]. The emergence of the distributed cloud platform based on Hadoop becomes the good ways to solve these problems. After telling about the core technologies of Hadoop: HDFS and MapReduce, this paper uses the VMware to build an efficient, extend easily cloud computing and storage platform based on the Hadoop distributed technology, and through the experiment verifies the advantage of distributed computation and storage.