ABSTRACT

There are many reasons for the existence of missing values: the failure of a sensor, different recording standards for different parts of a sample, or structural differences of the objects observed that make it impossible to record all attributes for all observed instances. A wind sensor might stop recording values after it was damaged in a thunderstorm, different hospitals might record different properties of a patientf’s history, a survey on cars won’t be able to state the number of cylinders for a car with a rotary engine. Figure 8.1 shows missing value plots for the Augsburg dataset from case study G. In a missing value plot, a bar is drawn for each variable that has missing values. The left part of the bar represents the proportion of observed cases, the right part of the bar shows the proportion of missing values for that variable. As for all area-based plots, bars representing counts of missing values are plotted in white. The left plot of Figure 8.1 shows the initial setting, listing all variables in the order they appear in the dataset. The right plot of Figure 8.1 is sorted according to the number of missing values. Missing value plots for the Augsburg data. Left: Initial variable order, right: variables ordered according to the number of missing values per variable. https://s3-euw1-ap-pe-df-pch-content-public-p.s3.eu-west-1.amazonaws.com/9780429150210/6c75e70c-7d2c-4c7c-86d5-09dd457711ce/content/fig8_1.tif"/>