ABSTRACT

Since memory reads or loads are very frequent, memory latency, that is the time it takes for memory to respond to requests, can impact performance significantly. Today, reading data from main memory requires more than 100 processor cycles while in “typical” programs about one in five instructions reads from memory. A naively built multi-GHz processor that executes instructions sequentially would thus have to spend most of its time simply waiting for memory to respond. The overall performance of such a processor would not be noticeably better than that of a processor that operated with a much slower clock (in the order of a few hundred MHz). Clearly, increasing processor speeds alone without at the same time finding a way to make memories respond faster makes no sense. Ideally, the memory latency problem would be attacked directly. Unfortunately, it is practically impossible to build a large, fast and cost effective memory. While it is presently impossible to make memory respond fast to all requests, it is possible to make memory respond faster to some requests. The more requests it can process faster, the higher the overall performance. This is the goal of traditional memory hierarchies where a collection of faster but smaller memory devices (commonly referred to as caches) is used to provide faster access to a dynamically changing subset of memory data. Given the limited size of caches and imperfections in the caching policies, memory hierarchies provide only a partial solution to

FIGURE 14.1: (a) Code sequence that reads and uses memory data.(b) Execution timing showing how memory latency impacts execution time with simple, in-order execution. (c) Making memory latency shorter reduces execution time. (d) Sending the load request earlier reduces wait time: memory processing is overlapped with other useful work.