ABSTRACT

This paper presents a hybrid control architecture for solving on-line optimal control. In this architecture, the control law is dynamically scheduled between a reinforcement controller and a stabilizing controller so that the closed-loop performance is smoothly transformed from a reactive behavior to one which can predict. Based on a modified Q-learning technique, the reinforcement controller is made of two components: policy and Q functions. The policy function is explicitly incorporated so as to bypass the minimum operator normally required for selecting actions and updating the Q function. This architecture is then applied to a repetitive operation using a second-order linear-time-variant plant with a nonlinear control structure. In this operation, the reinforcement signals are based on set-point errors and the reinforcement controller is generalized using second-order B-Splines networks. This example illustrates how, for a non-optimally tuned stabilizing controller, the closed-loop performance can be bootstrapped with the use of reinforcement learning. Results shows that the set-point performance of the hybrid controller is improved over that of the fixed structure controller by discovering better control strategies which compensate for the non-optimal gains and nonlinear control structure.