ABSTRACT

Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.3.2 Public key Cryptographic Algorithms . . . . . . . . . . . . . . . . . . . 135

5.3.2.1 The Rivest-Shamir-Aldleman (RSA) algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.3.2.2 Elliptic curve cryptography . . . . . . . . . . . . . . . . . . . 137 5.3.2.3 Elliptic curve over GF(p) . . . . . . . . . . . . . . . . . . . . . 138 5.3.2.4 Elliptic curve point multiplication . . . . . . . . . . . . 139 5.3.2.5 Elliptic curve points: Projective coordinates . 141 5.3.2.6 Modular arithmetic and RNS . . . . . . . . . . . . . . . . . 142

5.3.3 RNS Forward/Reverse Conversions . . . . . . . . . . . . . . . . . . . . . 145 5.3.4 RNS Addition/Subtraction and Multiplication . . . . . . . . . 152 5.3.5 RNS Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5.3.5.1 Basis extension with offset . . . . . . . . . . . . . . . . . . . . 159 5.3.5.2 Using several Montgomery domains . . . . . . . . 160

5.3.6 Mapping Algorithms to the RNS . . . . . . . . . . . . . . . . . . . . . . . . . 161 5.3.6.1 Non-RNS standard optimization . . . . . . . . . . . . . 162 5.3.6.2 Obtain the RNS domains . . . . . . . . . . . . . . . . . . . . . . 162 5.3.6.3 Identify RNS operations . . . . . . . . . . . . . . . . . . . . . . . 163 5.3.6.4 Merge multiplications with base extensions . 165 5.3.6.5 Montgomery domains and conversions . . . . . 165 5.3.6.6 Remove inefficient basis extensions . . . . . . . . . . 166 5.3.6.7 Evaluate dynamic ranges . . . . . . . . . . . . . . . . . . . . . 166 5.3.6.8 Extract conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.3.6.9 Extract register and constant map . . . . . . . . . . . . 167

5.3.7 Implementing an RNS-based Algorithm . . . . . . . . . . . . . . . . 167 5.3.7.1 GPU implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 168 5.3.7.2 FPGA implementation . . . . . . . . . . . . . . . . . . . . . . . . . 170

5.3.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.3.8.1 GPU implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.3.8.2 FPGA implementation . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.3.8.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

5.3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

5.1 Introduction Almost all forms of data communication in the modern day require security transmission and storage. This can be due to requirements to secure data residing on the application platform or due to the support of communication protocols. In particular, private key and public key cryptography are used to guarantee secure communication of information. Private key cryptography uses a secret key distributed to sender and receiver prior to the communica-

channel. most widely used private key ciphering algorithm is the Advanced Encryption Standard (AES) [307]. The AES algorithm is discussed in detail in Chapter 3. In contrast, public key cryptography is based on no prior knowledge of a secret key by the sender and receiver before data transmission which requires complex mathematical functions that can be efficiently computed but are difficult to invert. Public key cryptography is based on a distributed method to create a secure communication channel using a mathematically computed secret. Public key cryptography algorithms support digital signatures that can be used to validate data from a given signature generated by the sender. Two popular public key cryptography algorithms in wide use today are the Rivest-Shamir-Adleman (RSA) algorithm and the Elliptic Curve Cryptography (ECC) algorithm. Efficient implementations of public key cryptography algorithms are continuously being explored and implemented largely as high-performance hardware accelerators to support the large computational demands. These accelerators typically take advantage of efficient parallelization and programmability to improve flexibility and enable configuration.