ABSTRACT

In our treatment of generalised linear models (GLMs) in Chapter 26, we have come across two different types of rating factors:

a. Continuous variables such as the annual mileage or the driver’s age, which may or may not be related monotonically to the modelled variable (e.g. the claim frequency)

b. Categorical variables (i.e. variables with a limited number of possible values) that have in general no inherent order: examples are sex, occupation, car model and geographical location

Categorical variables are straightforward to model when the number of levels is small, such as is the case with sex. However, they pose a particular challenge if the number of levels is very large and there is no obvious way of ordering them, such as is the case for car models* or postcodes. In this case, the calibration of a GLM becomes problematic because, for some of the values (in our example, some of the car models), loss data may be scanty. Also, the lack of a natural ordering means that there is no obvious way of merging categories together (as we can do with ages) to obtain categories with more data.