ABSTRACT

This chapter dedicates reinforcement learning (RL) due to its increasing popularity within the Machine Learning community. In 2019 only, more than 25 papers dedicated to RL have been submitted to arXiv under the q:fin (quantitative finance) classification. The chapter introduces the core concepts of RL and follows relatively closely the notations of Sutton and Barto, which is widely considered as a solid reference in the field, along with Bertsekas. One central tool in the field is called the Markov Decision Process (MDP). MDPs, like all RL frameworks, involve the interaction between an agent (e.g., a trader or portfolio manager) and an environment (e.g., a financial market). The agent performs actions that may alter the state of environment and gets a reward for each action. Many solutions have been proposed to solve Markov Decision Processes in continuous spaces. A popular extension of REINFORCE is the so-called actor-critic method which combines policy gradient with Q- or v-learning.