ABSTRACT

The direct policy search approach tries to find the policy that maximizes the expected return. In this chapter, we introduce gradient-based algorithms for direct policy search. After the problem formulation in Section 7.1, the gradient ascent algorithm is introduced in Section 7.2. Then, in Section 7.3, its extention using natural gradients is described. In Section 7.4, application to computer graphics is shown. Finally, this chapter is concluded in Section 7.5.