ABSTRACT

This chapter discusses the role of the BG in reinforcement learning (RL). According to an influential hypothesis, phasic dopamine (DA) encodes a reward prediction error (RPE), a teaching signal for RL. Studies have shown that a burst of firing in midbrain DA neurons follows reward presentation, but with learning this burst is shifted to the earliest predictor of the reward. This pattern appears to support the RPE hypothesis. Unfortunately, although many studies have concluded that DA manipulations can change learning, they usually conflate learning and performance. Recent findings are incompatible with the RPE hypothesis. They show that different populations of DA neurons represent velocity or force vectors independent of learning or outcome valence. Manipulations of DA signaling produce performance deficits without affecting learning. The transition control model suggests that DA signaling is not a teaching signal in RL, but instead contributes to adaptive gain control to modify performance online, whether to escape from harm or to approach reward.