ABSTRACT

As days go by, our digital presence is increasing dramatically. With this, the risk of becoming a victim of digital crime is also increasing, and these attacks occur primarily through a vulnerable network. A network intrusion is an unauthorized access to a computer in a business or to a specific address in a domain that contains digital assets. Therefore, it is obvious for a network to have an intrusion detection system (IDS), which will keep the data safe from attackers. For the last decade, IDS has been a popular topic for researchers. Machine learning techniques have been applied in this network intrusion detection domain by researchers in recent years for their performance, accuracy, and robustness. Our study aims to make a fruitful comparison of the most used machine learning algorithms so that the best approach can be established. We used a US military intrusion dataset for this study with proper data preprocessing, which eventually increased the performance of the algorithms. The study examines the effectiveness of a number of well-known classifiers using a variety of metrics, including accuracy, F1-score, recall, precision score, accuracy, Cohen Kappa, specificity, and area under the curve (AUC). Additionally, we have calculated the elapsed time or run-time for each algorithm for the machine we used for the study. It turns out that the Random Forest classifier provides the best result of all algorithms studied, with an accuracy of 99%, precision of 100%, and F1 score of 100%. After that, we utilized AI-based mathematical method SHAP to explain the result of those algorithms. This explains why Random Forest performs the best. This study compares the most used approaches of intrusion detection methods with data preprocessing via ten-fold cross-validation, rather than that it explains the acceptability of Random Forest.