ABSTRACT

Aiming at the problem of low classification accuracy when applying the conventional random forest algorithm for data containing a large number of unlabeled samples in fault diagnosis, this chapter proposes an improved random forest algorithm based on the idea of semi-supervised learning. This algorithm avoids the waste of a large number of unlabeled samples by increasing the number of labeled samples from the unlabeled samples. The algorithm can solve the large-scale data analysis problem in conventional semi-supervised learning and provides a good solution to the problem of low prediction accuracy of random forest algorithm for a large number of unlabeled samples. To illustrate the effectiveness of the proposed method, an experiment is conducted on the wind turbine drivetrain diagnostic simulator using the proposed algorithm. The experiment results show that the improved random forest algorithm has very good theoretical and applied research value in the field of fault diagnosis.