chapter
Payoff Learning and Dynamics
ByHamidou Tembine
Pages 11

A central learning problem in dynamic environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the notion value of information (VoI), i.e., the expected improvement in future decision quality that might arise from the information acquired by exploration.