ABSTRACT

In this chapter, we describe our data mining tool for detecting malicious executables. It utilizes the feature extraction technique using n-gram analysis. We ’rst discuss how we extract binary n-gram features from the executables and then show how we select the best features using information gain. We also discuss the memory and scalability problem associated with the n-gram extraction and selection and how we solve it. en we describe how the assembly features and dynamic link library (DLL) call features are extracted. Finally, we describe how we combine these three kinds of features and train a classi’er using these features.