ABSTRACT

This chapter discusses an experimental system that uses big data analytics and cloud for malware detection and shows how big analytics techniques can be used for malware detection. Malware is a potent vehicle for many successful cyber attacks every year, including data and identity theft, system and data corruption, and denial of service; it therefore constitutes a significant security threat to many individuals and organizations. Malware includes viruses, worms, Trojan horses, time and logic bombs, botnets, and spyware. The problem of detecting malware using data mining involves classifying each executable as either benignor malicious. The chapter describes the design and implementation of an efficient and scalable feature-extraction and feature-selection technique using a cloud computing framework. MapReduce is an increasingly popular distributed programming paradigm used in cloud computing environments. Many intrusion detection problems can be formulated as classification problems for infinite-length, concept-drifting data streams.