ABSTRACT

In this chapter we consider the following problem: given a set P of points in a high-dimensional space, construct a data structure that given any query point q finds the point in P closest to q. This problem, called nearest neighbor search 1 , is of significant importance to several areas of computer science, including pattern recognition, searching in multimedia data, vector compression [[GG91]], computational statistics [[DW82]], and data mining. Many of these applications involve data sets that are very large (e.g., a database containing Web documents could contain over one billion documents). Moreover, the dimensionality of the points is usually large as well (e.g., in the order of a few hundred). Therefore, it is crucial to design algorithms that scale well with the database size as well as with the dimension.