ABSTRACT

The Mont-Blanc system features a digital current and voltage meter in the power supply rail to each Samsung Daughter Board. The work done in the Mont-Blanc project has been crucial to mature the High-Performance Computing software stack on ARM architectures. The Mont-Blanc partners developed an integrated tool stack supporting both ARM-based platforms and the OmpSs task-based programming model. The Mont-Blanc prototype has been deployed next to MareNostrum 3, the largest supercomputer in Spain, TIER-0 machine of the Partnership for Advanced Computing in Europe infrastructure and one of the top supercomputers in the world. The Mont-Blanc prototype has allowed us to characterize DRAM memory errors on a large system operating without hardware ECC and Low Power DDR technology. The use of commodity technologies was also considered when choosing the interconnection network for the Mont-Blanc prototype and software infrastructure. The Mont-Blanc prototype implements two separate networks: the 1 GbE management network and the 10 GbE Message Passing Interface network.