ABSTRACT

This chapter discusses several typical deep reinforcement learning (DRL) algorithms that apply deep neural network (DNN) to approximate value functions, policy functions and RL models. The well-known DNN includes convolutional neural networks used in computer vision and automatic speech recognition and recurrent neural networks used in language modeling. Deep Q-Network (DQN) is a model-free and off-policy algorithm, in which only samples from the emulator are used for solving the task. DQN utilizes two primary techniques for stabilizing the Q-learning: fixed target Q-network and experience replay. DQN is able to stabilize the process of using deep learning for value function approximation, which was believed to be unstable. Similar to DQN, deep deterministic policy gradient uses the idea of fixed target network to reduce the variation of value functions and experience replay to break the correlation of sampling data during training.