ABSTRACT

This chapter looks at several small examples that were run on Knight's Landing and Intel Haswell systems. It examines the traditional matrix multiply loop. Things that may cause the compiler to reject vectorization due to inefficiencies: excessive gather/scatters within the loop; excessive striding within the loop; complex decision processes in the loop and loop that is too short. The legacy vector computers had special hardware to handle vectorization of conditional blocks of code within loops controlled by IF statements. When performing outer-loop vectorization, rank expansion can be an important issue. When a loop-dependent IF test is present in a loop, the Knight’s Landing predicated execution is used to set a mask register and then perform the computation for all values except those where the mask/condition is not true. Intel's diagnostics are interesting, in that it computes the benefit of vectorization and then decides not to vectorize the loop.