ABSTRACT

The study of distributed systems and algorithms helps in understanding the specific features of these systems compared to classic centralized systems: information is local (each element of the system only holds a fraction of the information, and must obtain more by communicating with other elements) and time is local (the elements of the system can run their instructions at different speeds). These two factors result in nondeterministic behaviors, as two consecutive executions of the same distributed system are likely to be different. The fact that certain elements of the system can become faulty increases even further this nondeterminism and the difficulty of predicting the overall system’s behavior. When the number of components in a distributed system is increased, the possibility for one or

several of these components to become faulty also increases. When the production costs of these components are reduced to achieve economies of scale, the rate of potential defects again increases. Finally, when the components of the systems are deployed in an environment that is not necessarily controlled, the risks of faults occurring become impossible to overlook.