ABSTRACT
GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 20.4 A hybrid format for SpMV on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
20.4.1 Dense format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 20.4.2 Sliced COO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460 20.4.3 Determining the cut-off point of each format . . . . . . . . . . . 464
20.5 SCOO for single-precision floating-point matrices . . . . . . . . . . . . . . . 465 20.6 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
20.6.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 20.6.2 Experimental results of SpMV on NFS matrix . . . . . . . . . 466 20.6.3 Experimental results of SpMV on floating-point
matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 20.6.3.1 Performance comparison to existing GPU
formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 20.6.3.2 Performance comparison to a CPU
implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 20.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
The Number Field Sieve (NFS) is the current state-of-the-art integer factorization method. It requires the solution of a large sparse linear system over Galois Field GF(2) (called the linear algebra step). The Block Wiedemann (BW) [8] algorithm can be used to solve such a large sparse linear system efficiently using iterative sparse matrix vector multiplication (SpMV).