ABSTRACT

We present a hybrid architecture, called EXP1, which balances exploration and exploitation in order to efficiently solve two dimensional mazes with large state spaces (eg. 262144 states). To achieve this it draws on the strengths of the Genetic Algorithm in search & optimisation, and on the combined strengths of the Radial Basis Function Neural Network and Temporal Difference learning algorithm in approximating continuous functions with strong temporal dependence. The Neural Network acts as an Adaptive Heuristic Critic (AHC). Over successive trials it learns the V-function, a continuous mapping between real numbered positions in the maze and the value of being at those positions. EXP1 solved all the mazes with which we tested it and proved to be quite robust to changes in internal parameters. It also displayed some favourable capabilities in responding to time variant environments.