ABSTRACT

Wesley Alvaro Department of Electrical Engineering and Computer Science, University of Tennessee

Jakub Kurzak Department of Electrical Engineering and Computer Science, University of Tennessee

Jack Dongarra Department of Electrical Engineering and Computer Science, University of Tennessee Computer Science and Mathematics Division, Oak Ridge National Laboratory School of Mathematics & School of Computer Science, Manchester University

Dense matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue problems. The Cell B. E. excells in its capabilities to process compute-intensive workloads, like matrix multiplication, in single precision, through its powerful SIMD capabilities. This chapter disects implementations of two single precision matrix

multiplication kernels for the SIMD cores of the Cell B. E. (the SPEs), one implementing the C = C − A × BT operation and the other implementing the C = C − A × B operation, for fixed size matrices of 64 × 64 elements. The unique dual-issue architecture of the SPEs provides for a great balance of the floating-point operations and the memory and permutation operations, leading to the utilization of the floating-point pipeline in excess of 99 % in both cases.