ABSTRACT

Availability represents the probability that the system is capable of conducting its required function when it is called upon given that it has not failed or is undergoing a repair or an update action. Availability can be defined as the ability to guarantee nonloss of data and subsequent recovery of the system in a reasonable amount of time. Availability can be defined as the ability to guarantee nonloss of data and subsequent recovery of the system in a reasonable amount of time. The availability of a system depends on the availability of the configuration of supporting blocks, such as local area networks (LANs), wide area networks (WANs), and routers as well as the computer system itself, with its layers of software (SW). The chapter starts by defining system availability, maturity model and disaster recovery and business continuity planning (BCP). It then explains aspects related to availability and repair including Mean Time to Failure (MTTF), Mean Time to Repair (MTTR) and Mean Time Between Failures (MTBF). The last part of the chapter discusses aspects related to the availability of services, namely, Service-Level Agreements (SLAs), Quality of Service (QoS) and metrics for Interfacing to Cloud Service Providers. This chapter’s appendix describes aspects related to replication: replication is the copying of data from one system and its disks to another system and its completely independent and redundant set of disks. Replication is not the same as disk mirroring, because mirroring treats both sets of disks as a single, logical volume with enhanced availability, while replication treats the disk sets as two completely independent items. Mirroring is confined to a single computer system, while replication moves data from one system to another.