ABSTRACT

Learning to make good choices in a probabilistic environment requires that the Decision Maker resolves the tension between exploration (learning about all available options) and exploitation (consistently choosing the best option in order to maximize rewards). We present a mathematical learning model that makes selections in a repeated-choice probabilistic task based on the expected payoff associated with each option and the information gain that will result from choosing that option. This model can be used to analyze the relative impact of exploration and exploitation over time and under different conditions. It predicts the aggregated and individual learning trajectories of participants in various versions of the task sufficiently well to support our basic argument: Information gain is a valid and rational criterion underlying human decision making. Future modeling work will be addressing the exact nature of the interaction between exploration and exploitation.