ABSTRACT

Department of Electrical and Computer Engineering, University of Massachusetts, Lowell, MA, USA

1.1 Overview of the State of the Art in High-Level FPGA Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Introduction to the MORA Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 MORA Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 MORA Tool Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 The MORA Reconfigurable Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Processing Element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Control Unit and Address Generator . . . . . . . . . . . . . . . . . . . 8 1.3.3 Asynchronous Handshake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.4 Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 The MORA Intermediate Representation . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.1 Expression Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.2 Coordination Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.3 Generation Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.4 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.5 MORA-C++ API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.5.1 Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5.2 MORA-C++ by Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5.3 MORA-C++ Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.5.4 Floating-Point Compiler (FloPoCo) Integration . . . . . . . . 27

1.6 Hardware Infrastructure for the MORA Framework . . . . . . . . . . . . 29 1.6.1 Direct Memory Access (DMA) Channel Multiplexing . . 29 1.6.2 Vectorized RC Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.6.3 Shared Memory Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.7.1 Thousand-Core Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 33

Architecture,

1.7.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 1.7.3 Comparison with Other DCT Implementations . . . . . . . . . 38

1.8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

This chapter presents an overview of the current state of the MORA framework, a high-level programmable multicore FPGA system based on a dataflow network of Processors-in-Memory. The aim of the MORA framework is to simplify dataflow-based FPGA programming while still delivering excellent performance, by providing a streaming dataflow framework that can be programmed in C++ using a dedicated Application Programmer’s Interface (API). Many of the restrictions common to most other C-to-gates tools do not apply to MORA because of the adoption of processors rather than LUTs as the smallest unit of the design. MORA’s processors are unique as they are specialised in terms of instruction set, data path width, and memory size for the particular section of the program that runs on them. As a result, we have demonstrated an image processing application implemented using over a thousand cores.