Target Hybrid Multi/Manycore System | 3 | Programming for Hybrid Multi

ABSTRACT

This chapter focuses on the Knight's Landing system with its large, relatively slow memory bandwidth and small, high-bandwidth MCDRAM. An important additional characteristic of such a node is the memory architecture. All nodes used for High-performance computing have a non-uniform memory architecture (NUMA). The new Pascal graphics processing units (GPU) architecture from NVIDIA complicates the typical NUMA situation, as GPUs and CPUs can be connected by PCIe, by NVLink, or by both at the same time. CUDA gives the programmer much more flexibility in utilizing the registers, caches, and other close memories in the system which support graphics. On the GPU systems, there is a high-bandwidth GDDR or HBM memory available for the processing units in the GPU, and on some systems the GPU can access memory on the host. Most of the KNL systems being delivered are running some version of Linux.