ABSTRACT

Over the last several decades, platforms for high performance computing (HPC) have become increasingly complex. Today, the largest systems consist of tens of thousands of nodes. Each node is equipped with one or more multicore microprocessors. Individual processor cores support additional levels of parallelism including pipelined execution of multiple instructions, short vector operations, and simultaneous multithreading. Microprocessor-based nodes rely on deep multi-level memory hierarchies for managing latency and improving data bandwidth to processor cores. Subsystems for interprocessor communication and parallel I/O add to the overall complexity of these platforms.