ABSTRACT

Incident management is a major activity of digital operations. Broadly, incident management involves activities such as ticket handling, proactive maintenance, patch management, configuration management and others. In this chapter we discuss various processes to optimize and automate the incident management. Incidents that have pre-defined response (such as server restarts) can be automated through tools and scripts. We define the standard operating procedures (SOPs) for maintenance activities that helps in improving the response times for the maintenance activity. We use automation, self-service, proactive maintenance and incident-avoidance design to optimize the incident management process. The ticket automation process initially categorizes the ticket into one of the known categories. Each of the defined categories has an automation process response mechanism. The corresponding automation process is chosen to respond to the ticket. Self-service portals that provides interactive interface for administrators for maintenance activity minimizes the incident resolution time. We can use methods such as proactive maintenance, shift left design, root cause analysis and quality improvement, automation of maintenance activities, system monitoring and health checks for a ticket-avoidance design.