ABSTRACT

When querying in a massive dataset, many searching methods generate a high-dimensional vector for each object and then conduct the k-nearest-neighbor searching.[1] However, such a method is not efficient when the dataset size is very large and the dimension is very high. Other methods relying on a tree structure (such as kd-trees, BDD-trees, and vp-trees) require substantial memory space and time.[2,3]

Sometimes, they are even less efficient than the linear search approach that compares a query record with each record in the dataset one at a time. Moreover, all these methods compare a query with records during the searching process to locate similar records, degrading the searching performance.