Advanced Technical Aspects of mlr3

doi:10.1201/9781003402848-10

ABSTRACT

Parallelization is often required to efficiently run machine learning models, which means models are run simultaneously on multiple CPU cores, CPUs, or computational nodes. This chapter begins by demonstrating how mlr3 uses the future package for parallelization and how different ‘plans’ can be applied to mlr3 experiments.

In large machine learning experiments, it is common for a model to error during training or predicting. This is because the algorithms have to process arbitrary data, and not all eventualities can always be handled. It is therefore imperative to have robust methods for encapsulating and dealing with errors. This chapter builds on what has been briefly seen in Chapter 5 to discuss error handling and logging, including how to make use of fallback learners in experiments.

Large experiments may also require data to be handled in different formats and to prevent all the data being loaded into memory. This chapter discussed different ‘backends’ that can be used for mlr3 Tasks, including interfacing with DuckDB and SQL.

Finally, this chapter demonstrates how to extend classes in mlr3 by using the Measure class as an example. This may be of particular interest to readers who want to create new Measures or Learners.