ABSTRACT

Like exploratory data analysis, the Machine learning (ML) process is very iterative and heurstic-based. With minimal knowledge of the problem or data at hand, it is difficult to know which ML method will perform best. Approaching ML modeling correctly means approaching it strategically by spending our data wisely on learning and validation procedures, properly preprocessing the feature and target variables, minimizing data leakage, tuning hyperparameters, and assessing model performance. Many ML algorithms implemented in R have class weighting schemes to remedy imbalances internally. The R ecosystem provides a wide variety of ML algorithm implementations. Bootstrapping is, typically, more of an internal resampling procedure that is naturally built into certain ML algorithms. Hyperparameters are the “knobs to twiddle” to control the complexity of machine learning algorithms and, therefore, the bias-variance trade-off.