ABSTRACT

In this chapter we show how the theory of multi-armed bandits can usefully be applied to many problems of optimal search. The multi-armed bandit (MAB) problem was first posed around the end of World War II and defied solution for three decades until it was finally solved by John Gittins (1979). Gittins showed that the optimal solution to this decision problem involved an index policy in which each bandit is assigned an index and the optimal policy is to sample the bandit that currently has the highest index. These indices are sometimes referred to as Gittins’ indices in honor of Gittins, and we will follow this tradition.