ABSTRACT

Historically, models of failures have been linked with the level of abstraction in the specication of a system. A VLSI designer may focus on stuck-at-0 and stuck-at-1 faults where the outputs of certain gates are permanently stuck to either a 0 or a 1 regardless of input variations. A system-level hardware designer, on the other hand, may be ready to view a failure as any arbitrary or erroneous behavior of a module as a whole. A dip in the power supply voltage or radio interferences due to lightning or a cosmic shower can cause transient failures by perturbing the system state without causing any permanent damage to the hardware system. Messages propagating from one process to another may be lost in transit. Finally, even if hardware does not fail, soware may fail due to code corruption, system intrusions, improper or unexpected changes in the specications of the system, environmental changes, or human error.