Feature selection for optimizing the Naive Bayes algorithm

doi:10.1201/9780429322235-10

ABSTRACT

Naive Bayes is a data-mining method used in the classification of text-based documents. The advantage of this method is simple algorithms with low calculation complexity. However, Naive Bayes has a weakness where the independence of the Naive Bayes feature cannot always be applied so that it will affect the accuracy of calculations. Naive Bayes therefore needs to be optimized by giving scale using a gain ratio. Weighting with Naive Bayes raises problems in calculating the probability of each document, where many features that do not represent the tested class appear so that there is a misclassification. so weighting with Naive Bayes is still not optimal. This article proposes the optimization of Naive Bayes through using the weighting gain ratio, which is a method of selecting features in the case of text classification. The results of this study indicated that the Naive Bayes optimization method using feature selection and weighting gain ratio produces an accuracy of 94%.