ABSTRACT

Ensemble diversity, that is, the difference among the individual learners, is a fundamental issue in ensemblemethods.

Intuitively it is easy to understand that to gain from combination, the individual learners must be different, and otherwise there would be no performance improvement if identical individual learners were combined. Tumer and Ghosh [1995] analyzed the performance of simple soft voting ensemble using the decision boundary analysis introduced in Section 4.3.5.2, by introducing a term θ to describe the overall correlation among the individual learners. They showed that the expected added error of the ensemble is

errssvadd(H) = 1 + θ(T − 1)

T erradd(h) , (5.1)

where erradd(h) is the expected added error of the individual learners (for simplicity, all individual learners were assumed to have equal error), and T is the ensemble size. (5.1) discloses that if the learners are independent, i.e., θ = 0, the ensemblewill achieve a factor of T of error reduction than the individual learners; if the learners are totally correlated, i.e., θ = 1, no gains can be obtained from the combination. This analysis clearly shows that the diversity is crucial to ensemble performance. A similar conclusion can be obtained for other combination methods.