ABSTRACT

GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 20.4 A hybrid format for SpMV on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

20.4.1 Dense format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 20.4.2 Sliced COO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460 20.4.3 Determining the cut-off point of each format . . . . . . . . . . . 464

20.5 SCOO for single-precision floating-point matrices . . . . . . . . . . . . . . . 465 20.6 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466

20.6.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466 20.6.2 Experimental results of SpMV on NFS matrix . . . . . . . . . 466 20.6.3 Experimental results of SpMV on floating-point

matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 20.6.3.1 Performance comparison to existing GPU

formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 20.6.3.2 Performance comparison to a CPU

implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469 20.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471

The Number Field Sieve (NFS) is the current state-of-the-art integer factorization method. It requires the solution of a large sparse linear system over Galois Field GF(2) (called the linear algebra step). The Block Wiedemann (BW) [8] algorithm can be used to solve such a large sparse linear system efficiently using iterative sparse matrix vector multiplication (SpMV).