Multi-armed bandits and their economic applications | 16

ABSTRACT

In this chapter we show how the theory of multi-armed bandits can usefully be applied to many problems of optimal search. The multi-armed bandit (MAB) problem was ﬁrst posed around the end of World War II and deﬁed solution for three decades until it was ﬁnally solved by John Gittins (1979). Gittins showed that the optimal solution to this decision problem involved an index policy in which each bandit is assigned an index and the optimal policy is to sample the bandit that currently has the highest index. These indices are sometimes referred to as Gittins’ indices in honor of Gittins, and we will follow this tradition.