ABSTRACT

High-end supercomputers are inevitably massively parallel systems and they often feature complex architectures for each compute node. Additionally, exploiting the performance potential even of a single CPU is becoming increasingly difficult, e.g., due to limited memory performance. Such architectural restrictions pose great challenges to the development of efficient computational methods and algorithms.