Implementing Matrix Multiplication on the Cell B. E.
Wesley Alvaro Department of Electrical Engineering and Computer Science, University of Tennessee
Jakub Kurzak Department of Electrical Engineering and Computer Science, University of Tennessee
Jack Dongarra Department of Electrical Engineering and Computer Science, University of Tennessee Computer Science and Mathematics Division, Oak Ridge National Laboratory School of Mathematics & School of Computer Science, Manchester University
Dense matrix multiplication is one of the most common numerical operations, especially in the area of dense linear algebra, where it forms the core of many important algorithms, including solvers of linear systems of equations, least square problems, and singular and eigenvalue problems. The Cell B. E. excells in its capabilities to process compute-intensive workloads, like matrix multiplication, in single precision, through its powerful SIMD capabilities. This chapter disects implementations of two single precision matrix
multiplication kernels for the SIMD cores of the Cell B. E. (the SPEs), one implementing the C = C − A × BT operation and the other implementing the C = C − A × B operation, for fixed size matrices of 64 × 64 elements. The unique dual-issue architecture of the SPEs provides for a great balance of the floating-point operations and the memory and permutation operations, leading to the utilization of the floating-point pipeline in excess of 99 % in both cases.