ABSTRACT

GPU and multicore hybrid platforms deliver excellent performance on massively parallel processing and vector computing. It has become one of the most popular processing element (PE) to construct a modern parallel computer. Energy saving is an important issue that influences the design development of high performance computing (HPC) because large scale scientific computing may lead to an enormous energy predicament. The power efficiency of a multiprocessing system is dependent on not only the electrical features of hardware components but also the high level algorithms and programming paradigms. Enhancing the utilization of each individual PE to reach its best computation capability and power efficiency is important for optimizing overall system power performance. In this chapter, a power model based on measurement and experimental evaluation of SIMD computations on GPU and multicore architectures has been introduced. Three primary energy aware CUDA algorithm design methods have been investigated and illustrated, including: building a processing element with single CPU core

and parallel GPUs; splitting GPU workloads to CPU components; removing GPU computing overheard by executing small tasks with CPU functions. The improvements of these approaches on computation time and power consumption have been validated by examining the CUDA programs executing on real systems.