Data and Information Quality Issues | 13 | v2

ABSTRACT

Let us start off with a small thought experiment. To begin with, we should recognize that it is highly unlikely that any data set is 100% accurate. Then, suppose in the example landscape we have been using in previous chapters (e.g., Chapter 2, Figure 2.5), each layer is 90% (0.9) accurate. If we were to combine in an overlay of the geology and the land cover, we would end up with a map that is [0.9 and 0.9] accurate, which in probability terms would be 0.9 × 0.9 = 0.81, or 81% correct. Add another layer to the overlay and the result might theoretically be only 73% correct, and by the time we have used seven different layers in the analysis, our output product might be less than 50% correct. What then if this final map was used as input data for an environmental simulation? Of course, things are unlikely to be quite this bad in practice and, besides, plenty of errors (often unnoticed) were made in using traditional paper maps. Nevertheless, a good understanding of data quality issues is a key to informed use of geographical information systems (GIS) and environmental modeling. This chapter will tend to focus on issues of spatial data quality as these pose special problems in addition to those encountered in nonspatial data. As we saw in Chapter 3, spatial data quality is a fundamental concern of geo-information (GI) science. While considerable research is ongoing in this area, there is already a sizeable literature. For greater detail than provided here, the reader can refer to: Goodchild and Gopal (1989), Burrough and Frank (1996), Burrough and McDonell (1998), Shi et al. (2002), and Brown and Heuvelink (2008) for GIS perspectives; Heuvelink (1998) for an in-depth GIS and environmental modeling perspective; Li et al. (2000) for a process model perspective; Elith et al. (2002), McIntosh (2003), and Lowry et al. (2008) for an ecological perspective.