Overview of outlier detection methods and evaluation metrics: A review

doi:10.1201/9781003559092-127

ABSTRACT

An outlier is perceived as an anomalous activity or data point that is considered suspicious due to its significant deviation from the overall population. The identification of data objects that are exceptional and dissimilar from the majority of data points, setting them apart from the overall population, is referred to as outliers. The detection of outliers holds significant importance and finds extensive applications across diverse domains such as fraud detection, medical diagnosis, and intrusion detection etc.. The recognition of outliers in the area of data mining and machine learning, has engaging attention from a broad spectrum of researchers in diverse fields. Several methodologies have been devised by researchers to detect outliers. The aim of this study is to provide a concise review of outliers, including their types, various outlier detection techniques, and diverse evaluation metrics to gauge the effectiveness of these techniques. The study commences with delineating the concept of an outlier and its categorization. Various classes of outlier detection approaches are examined in our research, encompassing statistical, distance-based, density-based, clustering-based, and ensemble methods. Which are briefly examined. Furthermore, the article delineates the distinction between noise and outliers, emphasizing that outliers contain valuable insights and observations. Evaluation metrics such as “Precision, Recall, ROC, and AUC” are deliberated in the context of outlier detection. This review aims to contribute towards establishing a foundational understanding of outliers and methods of outlier detection.