ABSTRACT

In a distributed system, each process executes actions on the basis of local information that consists of its own state, and the states of its neighbors, or messages through the incoming channels. Many applications need to find out the global state of the system by collecting the local states of the component processes. These include:

• Computation of the network topology • Counting the number of processes in a distributed system • Detecting termination • Detecting deadlock • Detecting loss of coordination

The distributed snapshot algorithm (Chapter 8) clarifies the notion of a consistent global state, and helps record the fragments of a consistent global state into the local state spaces of the individual processes, but does not address the task of collecting these fragments. In this chapter we address this issue, and present several algorithms for global state collection.