A Criterion for the Existence of Pure and Stationary Optimal Strategies in Markov Decision Processes

doi:10.5117/9789089640574-ch17

Chapter

A Criterion for the Existence of Pure and Stationary Optimal Strategies in Markov Decision Processes

ABSTRACT

Markov decision processes model situations where a controller wishes to control optimally a system, taking her decisions in a sequential way and facing stochastic behaviour of the system. Step after step, the Markov decision process goes through a sequence of states s ₀, s ₁, … from a set of states S. At each step, the controller chooses an action a ∈ A, which causes the process to change from state s to new state t with fixed probability p(t|s, a). The probability that the decision process stops is 0, i.e., Σ _t _∈ _s p(t|s, a) = 1 and the time horizon is not bounded hence the decision process never stops. A history is an infinite sequence h = s ₀ a ₁ s ₁ … such that at each step n ∈ ℕ, the controller has chosen the action a_n ₊₁, knowing the sequence s ₀, s ₁,…, s_n of previous states.