Penalty Function Approach | 16 | Self-Learning Control of Finite Marko

ABSTRACT

The design of an adaptive learning control algorithm for controlled Markov chains will be based on the minimization of a loss function subject to some algebraic constraints. Let us first introduce some definitions concerning the controlled Markov chains, the loss function, and the constraints to be considered in this study.