ABSTRACT

Processor architectures have achieved performance improvements by using instruction-level parallelism in processors. In particular, the performance enhancement of superscalar processors is remarkable. However, the problem in superscalar processors is that they cannot use whole parallelism

CONTENTS

6.1 Introduction ................................................................................................. 177 6.2 Continuation-Based Multithreading Model ........................................... 178

6.2.1 Continuation .................................................................................... 178 6.2.2 Thread and Instance....................................................................... 180

6.3 Thread Programming Technique ............................................................. 180 6.3.1 Data-Driven Execution ................................................................... 180 6.3.2 Demand-Driven Execution ............................................................ 182 6.3.3 Thread Pipelining ........................................................................... 183

6.4 Fuce Processor ............................................................................................. 184 6.4.1 Thread Execution Unit ................................................................... 184 6.4.2 Register Files ................................................................................... 185 6.4.3 Thread Activation Controller ........................................................ 186

6.5 Implementation on FPGA .......................................................................... 187 6.5.1 Hardware Cost of the Fuce Processor .......................................... 188 6.5.2 Simulation Result ............................................................................ 189

6.6 Conclusion ................................................................................................... 193 Acknowledgments .............................................................................................. 194 References ............................................................................................................ 194

because the processors are limited in their ability to exploit instruction-level parallelism from single-process execution or single-thread execution [3]. In contrast, multithreading processors that exploit thread level parallelism are researched. The Simultaneous Multithreading (SMT) processor [7][9] executes two or more processes or threads simultaneously and achieves the improvement of throughput. A typical example of the SMT processor which is made for business is the Pentium 4 supporting hyper-threading technology [9].