ABSTRACT

In this chapter, the authors introduce a class of adaptive controllers for Markov chains that meet the challenge, under fairly relaxed conditions. They then turn they attention to the larger class of semi-Markov decision processes (SMDP). The authors also discuss two additional motivations for extending our algorithm to SMDP and show that these controllers achieve the minimum long-run average cost, and discuss their application to several problems of interest. A common procedure calls for an on-line estimation of the unknown parameter and the use of this estimate to generate the control law. This certainty equivalence approach may fail to achieve the optimal performance. The certainty equivalence controller based on this estimator attains optimal performance. Similar results are achieved by Milito and Cruz with an algorithm based on a functional that accounts simultaneously for both the estimation and control objectives. Finally, the authors emphasize that the model fits the description of a phone operator service center.