ABSTRACT

Besides an agent and an environment, Sutton and Barto [SB98] have further identified four important elements of reinforcement learning: a policy, a reward function, a value function, and optionally, a model of the environment. Specifically,

In multi-agent robotics, robotic agents may have many possible actions that they can take in response to a stimulus; and a policy determines which of the available actions the robots should undertake. Reinforcement is then applied based on “the results of that decision, and the policy is altered in a manner consistent with the outcome (reward or punishment)” [Ark98]. The ultimate goal is to “learn an optimal policy that chooses the best action for every set of possible inputs.” The robots strive to improve their performance, finding suitable behaviors as they interact with their environment. This approach has the added benefit of allowing the agents to adapt to different environmental conditions. For this reason, reinforcement learning has become an attractive learning technique in multi-agent robotics [Bal98, KLM96].