ABSTRACT

This chapter considers a model-free, least-squares algorithm for approximate policy iteration. An online variant of this algorithm is developed, and some important issues that appear in online reinforcement learning are emphasized along the way. Additionally, a procedure to integrate prior knowledge about the policy in this online variant is described, and a continuous-action approximator for the offline variant is introduced. These developments are experimentally evaluated for several control problems.