ABSTRACT

This chapter introduces algorithms for reinforcement learning, and presents model-based solutions. Model-based reinforcement learning methods always assume a perfect mathematical model on the environment’s dynamics, upon which an optimal action can be derived for each state. Markov decision processes (MDP) can be considered a classic and common model for reinforcement learning problems. There are three basic methods for solving MDP problems: linear programming, heuristic search and dynamic programming. In the context of MDP, the task of making a sequence of decisions that maximizes the expected cumulative reward can be simplified as finding the optimal policy for each state. Partially observable MDP has a broad range of applications in many fields, such as robotics, finance, and medical applications, in which the information received by the agent is always incomplete or noisy. The chapter examines the dynamic programming methods of policy iteration and value iteration to solve the Bellman optimality equations.