ABSTRACT

CONTENTS 6.1 Introduction 83 6.2 General Considerations of Inter-GPU Parallelization 84

6.2.1Typical Structure of an IR Algorithm 84 6.2.2Multi-GPU System Setup and Overall Structure 85 6.2.3Dataset Management 86

6.3Multi-GPU Implementation 88 6.3.1Forward and Backward Projections 88 6.3.2Regularizations and Multiple Resolution 89

6.4Experimental Results 90 6.5Conclusion and Discussions 93 References 95

lower CBCT imaging dose to the patient, addressing the clinical concern on the excessive cumulated imaging dose during the whole treatment course of the CBCT-based IGRT [25-27]. However, computational ineciency becomes a great obstacle, preventing IR being applied in the clinic. It is mainly because of the large problem size and the iterative nature of the algorithm. Specically, an IR algorithm usually reconstructs a CBCT image by solving an optimization problem using an iterative numerical algorithm. Inside each iteration step, a forward projection and a backward projection are typically computed, both of which have complexities similar to those of the FDK-type reconstruction algorithm. Since a number of iterations are required to yield a clinically acceptable image quality, the overall computation time is much longer than that of a typical FDK algorithm. In addition, FDK algorithm performs the back projection sequentially, making it feasible to conduct reconstruction immediately aer data acquisition starts. In contrast, an IR method requires all the projections at the initializing stage, prohibiting the concurrent execution of data acquisition and reconstruction. While graphics processing units (GPUs) have been employed recently to accelerate the IR process [14,19,28-33], it is still necessary to further boost the eciency for the time-critical IGRT environment.