ABSTRACT

Todaymulticore CPU clusters andGPU clusters are cheap and extensible parallel architectures, achieving high performances with a wide range of scientific applications. However, depending on the parallel algorithm used to solve the addressed problem and on the available features of the hardware, relative computing and energy performances of the clusters may vary. In fact, modern clusters cumulate several levels of parallelism. Current cluster nodes commonly have several CPU cores, each core supplying SSE units (streaming SIMD extension: small vector computing units sharing the CPU memory), and it is easy to install one or several GPU cards in each node (graphics processing unit: large vector computing units with their own memory). So, different kinds of computing kernels can be developed to achieve computations on a same node, some for the CPU cores, some for the SSE units, and some others for the GPUs. Also several combinations of those kernels can be used considering:

1. A cluster of multicore CPUs 2. A cluster of GPUs 3. A cluster of multicore CPUs with SSE units 4. A hybrid cluster of both GPUs and multicore CPUs 5. A hybrid cluster of both GPUs and multicore CPUs with SSE units

Each solution exploits a specific hardware configuration and requires a specific programming, and the different solutions lead to different execution times and energy consumptions. Moreover, the optimal combination of kernel and hardware configuration also depends on the problem and its data size.