ABSTRACT

Ensemble learning can be regarded as applying this “crowdwisdom” to the task of classification. Classification, or supervised learning, tries to infer a function that maps feature values into class labels from training data, and apply the function to data with unknown class labels. The function is called a model or classifier. We can regard a collection of some possible models as a hypothesis space H , and each single point h ∈ H in this space corresponds to a specific model. A classification algorithm usually makes certain assumptions about the hypothesis space and thus defines a hypothesis space to search for the correct model. The algorithm also defines a certain criterion to measure the quality of the model so that the model that has the best measure in the hypothesis space will be returned. A variety of classification algorithms have been developed [8], including Support Vector Machines [20,22], logistic regression [47], Naive Bayes, decision trees [15,64,65], k-nearest neighbor algorithm [21], and Neural Networks [67, 88]. They differ in the hypothesis space, model quality criteria and search strategies.