ABSTRACT

Register files have been known to be a notorious power consuming part of a processor architecture. It was already shown in various works [1-3] that there is a need for a comprehensive treatment of register files such that their power consumption is reduced while still meeting all the realtime requirements of an application. Multi-ported data register files (RF) are one of the most power hungry parts of any processor, especially very long instruction word processors (VLIWs) [4,5]. On average every operation requires three accesses (two reads and one write) to the RF, which make them a very active part of the processor. Current architectures try to achieve a high performance by exploiting parallelism, and therefore perform multiple operations per cycle (e.g., instruction level parallelism or ILP, as used in VLIW processors). This quickly results in a large port requirement for the register file, which is mostly implemented as a single/centralized or distributed large multi-ported register file. A high number of ports has a strong negative impact on the energy efficiency of register files as well as facing strong performance constraints for design. Traditionally, this problem is addressed through various clustering techniques [5] that partition (or bank) the RF. Data can then only be passed from one partition to another through intercluster communication [6,7]. However, as partitions get smaller the cost of intercluster copies quickly grows. In addition, the resulting register files are still multi-ported. For high energy efficiency, it is clearly preferable that the register cells be single ported [8].