ABSTRACT

A novel adaptive control algorithm1 for the control of constrained Markov chains whose transition probabilities are unknown is presented in this chapter [13]. A finite set of algebraic constraints is considered. This adaptive control algorithm is based on the Lagrange multipliers approach [14] with an additional regularizing term providing the continuity of the corresponding saddle point with respect to the transition probability matrix and the conditional expectation values of the loss and constrained functions. In this control algorithm the transition probabilities of the Markov chain are not estimated. The control policy uses only the observations of the realizations of the loss functions and the constraints. This control law is adapted using the Bush-Mosteller reinforcement scheme [14-15] which is related to stochastic approximation procedures [16-18). The Bush-Mosteller reinforcement scheme [19] is commonly used in the design of stochastic learning automata to solve many engineering problems. Learning deals with the ability of systems to improve their responses based on past experience. Controlling a Markov chain may be reduced to the design of a control policy which achieves some optimality of the control strategy under (or without) some constraints. In this study the optimality is associated with the minimization of a loss function which is assumed to be bounded under a set of algebraic constraints. So, the main features of this adaptive algorithm are:

-the use of the Stochastic Learning Automata approach to construct a recursive procedure to generate the asymptotically optimal control policy;

- the use of a Modified Lagrange Function including a regularizing term to guarantee the continuity in the parameters of the corresponding linear programming problem whose solution is connected with the optimal values of the main loss function under the given constraints;

- the estimation of the adaptation rate and its optimization within a class of the design parameters involved in the suggested adaptive procedure.