ABSTRACT

System reliability is defined as the combination of system resiliency and elasticity. With the proliferation of web-scale, data- and process-intensive applications across industry verticals, the application reliability has to be ensured at any cost to fulfill varying business expectations. Similarly, cloud environments emerge as the one-stop information technology (IT) solution for business processes and operations automations. All kinds of personal, professional and social applications are being meticulously modernized and moved to cloud centers to reap all the originally expressed benefits of the software-defined cloud infrastructures. Thus, cloud system reliability is also guaranteed through the leverage of highly pioneering technologies and tools. The reliability of applications and IT infrastructures is very important to retain customer confidence and loyalty. This can be accomplished by embracing the various innovations and improvisations happening in the IT space. This chapter discusses the best practices to educate software engineers, site reliability engineers (SREs), DevOps professionals and cloud operations teams. It will explain how the various technologies and tools such as containerization, container orchestration platforms, service mesh solutions and application programming interface (API) gateways blend well to arrive at resilient microservices, which, in turn, can be composed to create enterprise-grade and reliable applications.