ABSTRACT

Assume we can control an input xn ∈ C ⊂ R, and observe one response yn such that E[yn|xn,β] = f(xn ;β) and that the objective is to keep all the responses close to a target T. We propose sequential designs that always improve on Bayesian certainty equivalence designs by searching for the best design in a family that contains them. To regulate the distance and direction that they move away from the certainty equivalence choice, the new designs experiment on a credible region for the root of ƒ(x; β) = T. These heuristics perturb certainty equivalence to incentive ‘active’ learning about β and improve future control. We also describe how to apply this approach to the response surface bandit, where we need to keep all the responses close to the maximum of ƒ(x; β).