ABSTRACT

Statistical learning is concerned with the use of statistical and computational models for identifying patterns in data and predicting from these patterns. Statistical learning combines methods from statistics and machine learning and its methods can be categorized into supervised and unsupervised techniques. The primary aim of much machine learning research is to make good predictions, as opposed to statistical/Bayesian inference, which is good at helping to understand underlying mechanisms and uncertainties in the data. The mlr package facilitates resampling techniques in combination with the most popular statistical learning techniques including linear regression, semi-parametric models such as generalized additive models and machine learning techniques such as random forests, Support Vector Machine’s, and boosted regression trees. Machine learning algorithms often require hyperparameter inputs, the optimal ‘tuning’ of which can require thousands of model runs which require large computational resources, consuming much time, RAM and/or cores.