R-trees | 32 | Handbook of Data Structures and Applications

ABSTRACT

Spatial database management systems must be able to store and process large amounts of disk-resident spatial data. Multidimensional data support is needed in many ﬁelds including geographic information systems (GIS), computer aided design (CAD), and medical, multimedia, and scientiﬁc databases. Spatial data operations that need to be supported include spatial joins and various types of queries such as intersection, containment, topological and proximity queries. The challenge, and primary performance objective, for applications dealing with disk-resident data is to minimize the number of disk retrievals, or I/Os, needed to answer a query. Main memory data structures are designed to reduce computation time rather than I/O, and hence are not directly applicable to a disk based environment. Just as the B-tree [13] (Chapter 15) and its variants were proposed to optimize I/O while

processing single dimensional disk resident data, the original R-tree [23] and later variants have been proposed to index disk resident multidimensional data eﬃciently. R-trees are very versatile, in the sense that they can be used with little or no modiﬁca-

tion to tackle many diﬀerent problems related to spatial data. Asymptotically better data structures exist for speciﬁc instances of these problems, but the solution of choice is diﬀerent for diﬀerent types of inputs or queries. Thus, one of the main advantages of R-trees is that the same data structure can be used to handle diﬀerent problems on arbitrary types of multidimensional data. In this chapter we describe the original R-tree proposal and some of its variants. We

analyze the eﬃciency of various operations and examine various performance models. We also describe R-tree based algorithms for additional operations, such as proximity queries and spatial joins. There are many more applications, variants and issues related to R-trees than we can possibly cover in one chapter. Some of the ones we do not cover include parallel and distributed R-trees [9, 27, 46, 55], variants for high dimensional [5, 30, 67] and for spatiotemporal data [31, 48, 49, 53, 54, 62, 63], and concurrency [33, 41]. See Chapter 47 for more on concurrent data structures. Other data structures with similar functionality, such as

and

range, quad and k-d trees, are covered in other chapters of Part IV of this handbook. Some of these (see Chapter 27) are speciﬁcally designed for disk-resident data.