ABSTRACT

Skewed data set is when samples belonging to some class are larger or more than other samples in the set and the negative class is the majority class holding a large number of samples as well as the positive class otherwise positive class (namely minority class) [1]. Classification problem of skewed data set is one of the key research areas in machine learning and pattern recognition and is a large challenge to traditional algorithms, so theory and application value are presented for new machine learning methods to solve the problem. The problems are met regularly in the real world, such as disease diagnosis and tumor recognition in medical image as well as credit card defraud detection in which important information is lost because minority samples are not considered by traditional classification methods. In fact, information of minority samples is more important than majority samples in the problems.