ABSTRACT

This chapter aims to establish some context and defines the problem and describes the algorithms and procedures to automatically transform the raw signals collected from the machines in the datacenter to fingerprints. The goal of a fingerprint is to be a representation of a crisis that uniquely identifies that crisis. The chapter also describes how to automatically build a crisis fingerprint from the raw signals collected in the datacenter. In the other cases of incorrectly merged crisis types, while the large majority of signals are indistinguishable between the two types, a few signals show distinct behavior. The signals correspond to hardware, Operating System, application, or runtime-level measurements, such as the size of the object heap or number of threads waiting in the run queue. The signals include counts of alerts set up by the operators, queue lengths, latencies on intermediate processing steps, summaries of CPU utilization, and various application-specific signals.