A Comparative Analysis of Tree-Based Algorithms in Malware Detection

doi:10.1201/9781003218555-11

Chapter

A Comparative Analysis of Tree-Based Algorithms in Malware Detection

ABSTRACT

Machine learning, a branch of artificial intelligence, is one of the fields that has advanced expeditiously since its genesis. This emerging field now finds various applications in a multitude of domains in which the development of everyday algorithms is not feasible. One such relevant domain is cyber security: in today’s world, the detection of malware is a bigger issue faced by anyone connected to any network. Malware is software deployed with malicious intent, and the ease with which malware can be deployed is equivalent to the difficulty in tracking its source. It is known to destroy antivirus and other protection mechanisms, causing information security to be disabled. It is polymorphous and remains unscathed even after traditional security probes. It is also known to persist endlessly or to self-replicate, even after the system has rebooted because of configured initialization. Cyber security can be obtained through structured development and application of techniques of software engineering in the right direction; however, it continues to be a major concern given the fact that “cyber-crime” has increased immensely. According to Gartner Research, the worldwide information security market is expected to reach a net worth of $170 billion by the year 2022. Cybint Solutions states that about 62% of businesses experienced phishing and social engineering attacks in 2018, and 68% of business leaders feel their cybersecurity risks are increasing, says Accenture. An article in Cybercrime Magazine by Cybersecurity Ventures claims that, by 2021, the damage dealt by cyber-crime is anticipated to reach $6 trillion yearly. Tree-based algorithms are learning algorithms that function in a manner similar to the structure of trees. While the most basic model imitates a root as the main node that branches into smaller nodes, the others use a collection of various instances of the basic model in a unique manner to make their predictions. These models have proved to be successful predictors in numerous applications in the industry today, and they are sufficiently dependable to produce great accuracy in their predictions.

This chapter presents the applications of machine learning in malware detection using a comparative study of tree-based algorithms like decision trees, AdaBoost classifier, random forest classifier, and XGBoost classifier. In addition, tan analytical and graphical analysis has also been provided to reach an effective conclusion. This chapter also explains the architectures and recent trends of the same.