ABSTRACT

In this chapter, we investigate using ensemble approaches to combine feature selection and data classification for cancer diagnosis and credit scoring. First, an ensemble of feature selection techniques is used for feature selection, where each member yields a different feature set. Then, two alternatives are tested. The first one combines these feature sets to obtain a single solution on which a classifier is trained. The second alternative trains a classifier on each feature set and then combines the classifier ensemble to obtain a single classification output. We hypothesize that the reliability of prediction resulting from each ensemble combination level differs depending on the data dimensionality. Thus, in such an ensemble system, it is necessary to find out the appropriate combination level to obtain the best classification results. The proposed ensemble approaches are evaluated based on two high-dimensional data sets concerned with cancer diagnosis, as well as on two small-size data sets concerned with credit scoring. Evaluation results suggest that the ensemble approaches outperform the baseline models and that data set dimensionality can guide the choice of the aggregation level of the ensemble method.