ABSTRACT

We have presented a data mining-based malicious executable detection technique, which is scalable over a large dataset. Here we apply a multi-level feature extraction technique by combining three di¡erent kinds of features at di¡erent levels of abstraction. ese are binary n-grams, assembly instruction sequences, and Dynamic Link Library (DLL) function calls, extracted from binary executables, disassembled executables, and executable headers, respectively. We apply this technique on a large corpus of real benign and malicious executables. Our model is compared against other feature-based approaches for malicious code detection and found to be more e¨cient in terms of detection accuracy and false alarm rate.