ABSTRACT

This chapter describes an algorithm for approximate policy search in continuousstate, discrete-action problems. The algorithm looks for the best policy that can be represented using a given number of basis functions associated with discrete actions. The locations and shapes of the basis functions, together with the action assignments, are optimized using the cross-entropy method, so that the empirical return from a representative set of initial states is maximized. The resulting cross-entropy policy search algorithm is evaluated in problems with two to six state variables.