ABSTRACT

Bootstrap aggregating, also called bagging, is one of the first ensemble algorithms machine learning practitioners learn and is designed to improve the stability and accuracy of regression and classification algorithms. By model averaging, bagging helps to reduce variance and minimize overfitting. Bootstrap aggregating prediction models is a general method for fitting multiple versions of a prediction model and then combining them into an aggregated prediction. The general idea behind bagging is referred to as the “wisdom of the crowd” effect and was popularized by J. Surowiecki. A benefit to creating ensembles via bagging, which is based on resampling with replacement, is that it can provide its own internal estimate of predictive performance with the out-of-bag sample. Bagging improves the prediction accuracy for high variance models at the expense of interpretability and computational speed.