ABSTRACT

This paper presents a modification to the BOXES-ASE/ACE reinforcement learning algorithm to improve implementation efficiency. We introduce a state history queue (SHQ) to replace the decay variables associated with the states, decoupling the hardware complexity from the number of control states. We analyzed the effectiveness of the SHQ and constructed both a simulation and an actual hardware of a pole-cart balancer. Analysis shows this technique preserves performance, while decimating the required computation time and memory demand of tracking access to each state.