ABSTRACT

In the preceding chapter, we have seen how a battlefield can be modeled as a stochastic game, and how an agent can learn from its opponent’s actions to form its own strategy accordingly. The learning approach taken there, reinforcement learning, has been shown to be very effective in a single-agent, stochastic environment (see [1,8,14] and references therein). To apply the same approach to a multiagent game, the basic algorithm of reinforcement learning must be modified because as every agent in a multiagent game is learning, their interaction creates a

time-variant

environment, whereas the original formulation of the reinforcement learning algorithm assumes a time-invariant environment. To see why multiagent learning can create a time-variant environment, consider an individual agent

X

in a multiagent game. Because every agent is learning, the other agents’ reactions to the strategy improvement of agent

X

will depend on how agent

X

improves its strategy. Alternately, as all the other agents learn to react to agent

X

, the learning environment of agent

X

shifts. (To any individual agent, all the other agents constitute its learning environment.) Therefore, the learning approach taken by a single agent will affect how its own learning environment will shift over time.