ABSTRACT

This chapter examines the data landscape and general data requirements to develop quantitative toxicity models. Optimal cancer treatment requires maximising the chance of tumour eradication while simultaneously minimising the risk of adverse treatment-induced side effects. Real-world data refers to information about patients, medical interventions and clinical findings that have been derived from routine procedures in the standard-of-care care setting. The most crucial aspect defining the potential success of a predictive toxicity model is the availability and quality of outcomes data. Data sharing enables either a larger combined data corpus for training and cross-validation, or the option for a model developed on one corpus to be independently validated against the other corpus. An increasing amount of data is being generated in different fields of modern medicine including medical imaging, transcriptomics, metabolomics and proteomics. For the abovementioned distributed learning methodology to work, the local data needs to be parsed in a format that is fully machine-readable and machine-understandable.