ABSTRACT

In this chapter, we introduce transition probability estimation methods for model-based reinforcement learning (Wang & Dietterich, 2003; Deisenroth & Rasmussen, 2011). Among the methods described in Section 10.1, a nonparametric transition model estimator called least-squares conditional density estimation (LSCDE) (Sugiyama et al., 2010) is shown to be the most promising approach (Tangkaratt et al., 2014a). Then in Section 10.2, we describe how the transition model estimator can be utilized in model-based reinforcement learning. In Section 10.3, experimental performance of a model-based policy-prior search method is evaluated. Finally, in Section 10.4, this chapter is concluded.