ABSTRACT

Aiming at the problems faced in the analysis and processing of the current distribution big data, a large data access platform architecture suitable for distribution networks is proposed. The platform mainly includes the basic layer, the calculate layer and the application layer. The distributed coordination server Zookeeper is the core of basic layer .The distributed file management system HDFS and Ceph, the resource manager Yarn and the observer, which provide consistent services to the entire system, are the most important component. The Spark SQL is the programming interface of the calculation layer. It designs graph calculation GraphX, machine learning ML and MapReduce and other calculation frameworks to complete the calculation tasks of the entire system and provide interface services to the application layer. The application layer takes the HM7000 series as the core, which has the common functions of the power distribution system. We can know from the test results that the annual availability rate of the main station system equipment of the platform reaches 99.9%, the average CPU load rate is 39.6%, the number of accessible workstations is 46, and other key indicators meet the needs of county-level power distribution units.