ABSTRACT

Historically, geographers have been able to demonstrate the importance of location in health research. Spatial models are employed in diverse applications such as cluster analysis (DeLuca and Kanaroglou, 2008; Zimmerman and Lin, 2010), disease surveillance (Gumpertz et al., 2006; Waller and Gotway, 2004), exposure assignment (Dolk et al., 2000; Finkelstein et al., 2003), accessibility to health care (Passalent et al., 2013), allocation of subsidies to underserviced areas (Sorensen et al., 2000) and studies of obesity and the built environment (Merchant et al., 2011). These applications are often made possible through some form of automated geocoding of addresses. Geocoding is the process whereby one matches a text-based address to positional information (for example, latitude and longitude, or some other form of coordinates). This process can be carried out in a variety of ways including the use of postal code conversion files (PCCFs), address-based matching found in GIS packages, value added utilities, like DMTI’s GeoPinpoint Suite (DMTI Spatial, 2007) and internet-based approaches, such as ArcGIS Online geocoding service (ESRI, 2012) or Google or Yahoo’s geocoding API (application programming interface) either directly, or through sites such as https://www.batchgeo.com" xmlns:xlink="https://www.w3.org/1999/xlink">www.batchgeo.com or https://www.spatialepidemiology.net" xmlns:xlink="https://www.w3.org/1999/xlink">www.spatialepidemiology.net which are built on Google. All approaches involve the input of a standardized address or postal code (source data), and a reference data set (typically a street network file) for which an iterative comparison of the address to the reference data can take place to calculate geographic coordinates. The calculation generally is based on interpolation along a street segment for which the geographic coordinates of the beginning and end points are known, and/or areal interpolation within a parcel, ZIP code, or city polygon (Jacquez, 2012). The accuracy of the geocode then is directly related to the quality of both the source data supplied and the quality of the reference data utilized. Quite often, full address information is incorrect, missing, or suppressed due to confidentiality. In situations like these, postal codes are often used either as the sole source of address information or as additional information in a multi-staged geocoding approach using postal codes to increase the number of matches (McElroy et al., 2003).