ABSTRACT

Setting out to build a system that “does not fail” is not sufficient. We cannot do that. No matter what we do, the hardware with which we build computer systems has a finite life, and we cannot predict precisely when any given system will fail. Without such a prediction, we cannot avoid the effects of hardware failure. One might think that we could avoid the problem by using more hardware, but no matter how much hardware we use, failure at some unpredictable point is inevitable. And much or all of the hardware could be destroyed at the same time by a serious external trauma.