ABSTRACT

Consider a robot wandering around an unfamiliar environment, performing actions, and observing the perceptual consequences. The robot's task is to construct an internal model of its environment, a model that will allow it to predict the outcome of its actions and to determine what sequences of actions to take to reach particular goal states. Rivest and Schapire (1987a, 1987b, 1988) have studied this problem and have designed a symbolic algorithm to strategically explore and infer the structure of finite-state environments. The heart of this algorithm is a clever representation of the environment called an update graph. We have developed a connectionist implementation of the update graph using a highly specialized network architecture. We also describe a technique for using the connectionist update graph to guide the robot from an arbitrary starting state to a goal state. This technique requires a critic that associates the update graph's current state with the expected time to reach the goal state. At each time step the action is chosen which minimizes the output of the critic. The control acquisition technique is demonstrated on several environments.