ABSTRACT
This chapter discusses the SW-level optimizations for CapsNet training and inference, the hardware designs of the PE array, and the memory organizations for CapsNet inference, integrated with post-training optimizations such as quantization and approximate designs. In Section 3.1, a framework for efficiently training CapsNets is proposed. Note that, while the remaining sections focus on optimizations for CapsNets inference, the training methodology is key for achieving high accuracy in a reasonable training time, which is also beneficial for Chapter 4 and Chapter 5. Section 3.2 presents an efficient hardware accelerator for CapsNets inference, while a more comprehensive design space exploration (DSE) of the architecture based on PE arrays is discussed in Section 3.3. In Section 3.4, a DSE and design flow for the memory organizations in CapsNet accelerators are discussed. Section 3.5 presents a quantization framework for obtaining compact CapsNets models in a constrained memory budget. Further energy efficiency can be achieved by approximating the hardware designs of the PE array, as discussed in Section 3.6, or approximating the most compute-intensive activation functions like Squash and Softmax, as presented in Section 3.7.
