ABSTRACT

This chapter focuses on clustering applications. First, it applies clustering to the problem of malware classification. Automatically classifying malware is a challenging task, and it shows that reasonably good results can be obtained based on clustering. Then it attempts to take these results a step further by applying clustering to one of the most challenging problems in information security, namely, the automatic detection of new malware. Hidden Markov Models (HMMs) have been successfully applied to selected malware detection problems. The chapter apply clustering, based on HMM scores, to the related problem of classifying malware. It focuses on a comparison of K-means and EM clustering, in the context of classifying new malware. It considers a set of experiments where one can perform EM and K-means clustering based on HMM scores. The classification is then based on the purity of the resulting clusters.