ABSTRACT

The library BLAS [24], [25], [44] defines a standard for the bricks on which efficient modern matrix computing is organized. In this chapter we define such types of operations and study levels of their complexity, both in terms of computer arithmetic and memory references involved in BLAS implementation. To use the full power of the processors, it is important to maintain a high value for the ratio between the number of floating operations, flops, and the number of memory references that measures communication time between memory and processor.