ABSTRACT

This chapter describes a data mining technique that is dedicated to the automated generation of signatures to defend against certain malware attacks. It discusses malware detection. The chapter discusses the classification algorithm and proves its effectiveness analytically. It describes the feature extraction and selection technique using cloud computing for malware detection. The problem of detecting malware using data mining involves classifying each executable as either benign or malicious. Data mining-based approaches analyse the content of an executable and classify it as malware if a certain combination of features are found (or not found) in the executable. The malicious code detection problem can be modeled as a data mining problem for a stream having both infinite length and concept-drift. Many intrusion detection problems can be formulated as classification problems for infinite-length, concept-drifting data streams. EMPC uses generalized, multi-partition, multi-chunk ensemble learning.