ABSTRACT

A fault is the manifestation of an unexpected behavior, and fault-tolerance is a mechanism that masks or restores the expected behavior of a system following the occurrence of faults. Attention to fault-tolerance or dependability has drastically increased over recent years due to our increased dependence on computers to perform critical as well as noncritical tasks. Also, the increase in the scale of such systems indirectly contributes to the rising number of faults. Advances in hardware engineering can make the individual components more dependable, but it cannot eliminate faults altogether. Bad system designs can also contribute to failures.