ABSTRACT
The identification of anomalous activities in the system is important for both system health and integrity as well as its performance. Rising complexity and volume of logs are causing traditional rule-based approach to be supplanted by new powerful machine learning solutions, which are capable of real-time analysis, error and event detection in various formats of logs. This survey paper covers recently used methodologies in the paradigm of log anomaly detection with emphasis on the categories of supervised, semi supervised, content based, clustering and retrieval-based methods. This survey paper provides an analysis of each algorithm's approach, data input, time complexity, and generalizability to other data sets. Further, to better understand the particularities of these approaches, their fundamental parameters of performance, including accuracy, scalability, and data preprocessing requirements, are also compared. This approach aims to provide an understanding of the contemporary theme in log anomaly detection, suggest future directions and directions for future research in extending Artificial Intelligence and Machine Learning to augment log analysis.
