A Survey of Global Explanations in Reinforcement Learning

doi:10.1201/9781003355281-2

ABSTRACT

In this chapter, we review existing work from the emerging research on global explanations of reinforcement learning (RL) agents. This is an important area of research in explainable agency, as the maturing of RL led to the increasing deployment of RL agents in real-world settings. We focus on global explanations, which aim to describe or explain the agent's overall strategy or behavior, as opposed to local explanations that focus on the agent's decision in a specific world state. We identify three main types of explanations that have been proposed in the literature, namely those that: (1) provide interpretable representations of a policy or the underlying Markov decision process; (2) explain the agent's behavior through demonstration of its policy, and (3) rule-based methods that describe the agent's policy through a set of logical rules. We further discuss the evaluation methods used to assess the contribution of global explanations. We end with a discussion of emerging trends and gaps that could suggest avenues for future work.