ABSTRACT
In this chapter, we describe our data mining tool for detecting malicious executables. It utilizes the feature extraction technique using n-gram analysis. We ’rst discuss how we extract binary n-gram features from the executables and then show how we select the best features using information gain. We also discuss the memory and scalability problem associated with the n-gram extraction and selection and how we solve it. en we describe how the assembly features and dynamic link library (DLL) call features are extracted. Finally, we describe how we combine these three kinds of features and train a classi’er using these features.