ABSTRACT

This chapter introduces two classification methods: k-nearest neighbor classifier and supervised clustering, which includes the k-nearest neighbor classifier as a part of the method. It describes many measures of similarity or dissimilarity exist, including the Euclidean distance, the Minkowski distance, the Hamming distance, Pearson’s correlation coefficient, and cosine similarity. When the Minkowski distance measure is used, different attribute variables may have different means, variances, and ranges and bring different scales into the distance computation. The normalization is performed by applying the same normalization method to all the attribute variables. The normalized attribute variables are used to compute the Minkowski distance. The supervised clustering algorithm was developed and applied to cyber attack detection for classifying the observed data of computer and network activities into one of two target classes: attacks and normal use activities. For cyber attack detection, the training data contain large amounts of computer and network data for learning data patterns of attacks and normal use activities.