ABSTRACT

This chapter presents a two-level constraint solver designed for the Graphics processing units (GPU). It analyzes pipelined batching, which overcomes issues of local batching presented in, and a detailed performance analysis. The chapter discusses a general description of the solver and then present a GPU implementation. Rigid body simulation in general consists of broad-phase collision detection, narrow-phase collision detection, and constraint solving. The two-level constraint solver performs two preparations: global split and local batching. Local batching is especially challenging for GPUs because the naïve implementation of batching is a serial algorithm. The chapter demonstrates pipelined local batching, which parallelizes local batching by pipelining the operation and using a SIMD lane as a stage of the pipeline. While the global constraint solver dispatches a kernel for each batch, the two-level constraint solver always executes four kernels in which a SIMD of the GPU repeats in-SIMD dispatches by itself until all the constraints belonging to the group are solved.