Random forest algorithm for spam filtering based on machine learning

doi:10.1201/b18471-53

ABSTRACT

Random forest (RF) composed of many decision trees as an ensemble classifier that (Hastie. T & Tibshirani. R & Friedman, J 2009). In the classifiers each member

is a decision tree, therefore the collection of these trees generate a “forest”. Random forest tries one’s best to improve bagging property by de-correlating equalization algorithm on the trees. With this method, it makes each decision tree in RF has the same expectation. It predicts the vote of the classifier output by individual trees (H. Drucker & V. Vapnik & D. Wu 1999). In order to create a set of decision trees, the method uses some controlled variations to compose bagging idea with the random selection of features. In view of the given original training dataset D, that d refers to the number of samples and m refer to the number of the available feature. The pseudo-code of the whole RF algorithm is summarized in Algorithm I. That is a process of generating k decision trees. At its iteration step (i=1, 2 …k), from the original dataset they are randomly selected that d examples of every ensemble member with replacement. That is, every Di is the set of the bootstrapped instances. This makes some examples appear many times, while others may not be once in Di.