ABSTRACT

Fault-tolerance techniques can be divided into two broad classes: masking and nonmasking. Certain types of applications call for masking type of tolerance, where the eect of the failure is completely invisible to the application; these include safety-critical systems, some real-time systems, and certain sensitive database applications in the nancial world. For others, nonmasking tolerance is considered adequate. In the area of control systems, feedback control is a type of nonmasking tolerance used for more than a century. Once the system deviates from its desired state, a detector detects the deviation and sends a correcting signal that restores the system to its desired state. Rollback recovery (Chapter 14) is a type of nonmasking tolerance (known as backward error recovery) that aims at making the history of the computation correct and relies on saving intermediate states or checkpoints on a stable storage. Stabilization (also called self-stabilization), on the other hand, does not rely on the integrity of any kind of data storage and makes no attempt to recover lost computation but guarantees that eventually a good conguration is restored. is is why it is called forward error recovery.