ABSTRACT

Markov decision processes model situations where a controller wishes to control optimally a system, taking her decisions in a sequential way and facing stochastic behaviour of the system. Step after step, the Markov decision process goes through a sequence of states s 0, s 1, … from a set of states S. At each step, the controller chooses an action a ∈ A, which causes the process to change from state s to new state t with fixed probability p(t|s, a). The probability that the decision process stops is 0, i.e., Σ t s p(t|s, a) = 1 and the time horizon is not bounded hence the decision process never stops. A history is an infinite sequence h = s 0 a 1 s 1 … such that at each step n ∈ ℕ, the controller has chosen the action an +1, knowing the sequence s 0, s 1,…, sn of previous states.