ABSTRACT

Modelling is rarely an automated process by which one transforms a raw set of data into a set of results. Before any modelling can be attempted, a significant amount of exploratory work often needs to be carried out on the raw data. For lack of a standard term, we call this exploratory work ‘data preparation’ (Figure 10.1). This is often an open-ended exercise but will normally involve the following activities:

• Checking data to determine whether there are errors or anomalies • Summarising data to provide a bird’s-eye view of the risk and understand obvious

trends/anomalies • Preparing a standardised input for the rest of the risk costing process, in particu-

lar for the frequency and the severity analysis

To keep things as concrete as possible, let us illustrate data preparation using a realworld data set, in which amounts and names have been disguised beyond recognition for data privacy purposes. The policyholder (or prospective policyholder) discussed here is in facilities management, and the policy is public liability.