ABSTRACT

CONTENTS 8.1 Introduction ............................................................................................... 159

8.1.1 Background.................................................................................... 160 8.1.2 Computational Chemistry and Graph Theory......................... 161

8.2 Methods...................................................................................................... 162 8.2.1 Program.......................................................................................... 162 8.2.2 Data ................................................................................................. 162

8.2.2.1 Geographical Area......................................................... 162 8.2.2.2 Deprivation ..................................................................... 163 8.2.2.3 Standardized Long-Term Limiting Illness

for People Aged Less Than 75..................................... 164 8.2.2.4 Adjacency Information.................................................. 165

8.2.3 Storage of Information ................................................................. 165 8.2.4 Queries............................................................................................ 166

8.2.4.1 Query Patterns................................................................ 166 8.2.4.2 Query Data File .............................................................. 167

8.3 Results......................................................................................................... 169 8.4 Discussion .................................................................................................. 172 Acknowledgments ............................................................................................. 175 References ........................................................................................................... 175

Pattern identification is an important issue in public health, and current methods are not designed to deal with identifying complex geographical patterns of illness and disease. Graph theory has been used successfullywithin the field of chemoinformatics to identify complex user-defined patterns,

or substructures, within molecules in databases of two-dimensional (2D) and three-dimensional (3D) chemical structures. In this paper we describe a study in which one graph theoretical method, the maximum common substructure (MCS) algorithm, which has been successful in identifying such patterns, has been adapted for use in identifying geographical patterns in public health data. We describe how the RASCAL (RApid Similarity CALculator) program (Raymond and Willett, 2002; Raymond et al., 2002a,b), which uses the MCS method, was utilized for identifying user-specified geographical patterns of socioeconomic deprivation and long-term limiting illness. The paper illustrates the use of this method, presents the results from searches in a large database of public health data, and thendiscusses the potential of graph theory for use in searching for geographical-based information.