ABSTRACT

In order to classify the poisonous mushroom, the group use the dataset on Kaggle to train models by SVM, naive Bayes, and random forest. After the first round of training, the teams’ models achieved an accuracy rate close to 1, and group supervisors thought that the models were over-fitting. When the team returned to the data analysis stage again, the team found that the high accuracy rate was only due to the nearly linear separable data according to the principle of SVM. The significance of this paper is to provide an unconventional method to determine whether data sets are linearly separable.