ABSTRACT

Least-squares policy iteration explained in Chapter 2 works well, given appropriate basis functions for value function approximation. Because of its smoothness, the Gaussian kernel is a popular and useful choice as a basis function. However, it does not allow for discontinuity, which is conceivable in many reinforcement learning tasks. In this chapter, we introduce an alternative basis function based on geodesic Gaussian kernels (GGKs), which exploit the nonlinear manifold structure induced by the Markov decision processes (MDPs). The details of GGK are explained in Section 3.1, and its relation to other basis function designs is discussed in Section 3.2. Then, experimental performance is numerically evaluated in Section 3.3, and this chapter is concluded in Section 3.4.