Computer Reliability

doi:10.1201/9781315220659-20

ABSTRACT

This chapter outlines the knowledge needed to estimate the reliability of any electronic system or subsystem within a computer. The first step in estimating the reliability of a computer system is to determine the likelihood of failure of each of the individual components such as resistors, capacitors, integrated circuits, and connectors, which make up the system. A computer system that uses parallel subsystems to improve reliability must incorporate some kind of arbitrator to determine which output to use at any given time. To model the reliability of any system it is necessary to define the various fault-free and faulty states that could exist. Component failure may be caused by internal physical phenomena or by external environmental effects such as electromagnetic fields or power supply variations. Physical faults within a component can be characterized by their external electrical effects. These effects are commonly classified into fault models.