ABSTRACT

The direct policy search methods explained in Chapter 7 and Chapter 8 are useful in solving problems with continuous actions such as robot control. However, they tend to suffer from instability of policy update. In this chapter, we introduce an alternative policy search method called policy-prior search, which is adopted in the PGPE (policy gradients with parameter-based exploration) method (Sehnke et al., 2010). The basic idea is to use deterministic policies to remove excessive randomness and introduce useful stochasticity by considering a prior distribution for policy parameters.