ABSTRACT

Barcelona Supercomputing Center, Universitat Polite`cnica de Catalunya, Barcelona, Spain

Adria´n Cristal

Barcelona Supercomputing Center/CSIC — Spanish National Research Council, Spain

Satnam Singh

Google Inc., Mountain View, CA, USA

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.1.1 FPGAs for Architectural Investigation . . . . . . . . . . . . . . . . . . 172 6.1.2 Transactional Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

6.2 The TMbox Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.2.1 Interconnection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6.3 Hybrid TM Support for TMbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 6.3.1 Instruction Set Architecture Extensions . . . . . . . . . . . . . . . . 180 6.3.2 Bus Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.3.3 Cache Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

6.4 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 6.4.1 Architectural Benefits and Drawbacks . . . . . . . . . . . . . . . . . . 183 6.4.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

6.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

In this chapter, we present the design and implementation of TMbox: A multiprocessor system-on-chip (MPSoC) built to explore trade-offs in multicore design space and to evaluate recent parallel programming proposals such as Transactional Memory (TM). For this work, we evaluate a 16-core Hybrid

Architecture,

Transactional Memory implementation based on the TinySTM-ASF (Software Transactional Memory – Advanced Synchronization Facility) proposal on a Virtex-5 FPGA and we accelerate three benchmarks written to investigate TM trade-offs. Our flexible system, composed of MIPS R3000 compatible cores, is easily modifiable to study different architecture, library, or operating system extensions.