Learning indexing method for massive high-dimensional data based on partitioned hierarchical graph
Learning to index is the key to solving the problem of approximate nearest neighbor search in massive high-dimensional data.However,existing learning to index techniques are limited to individ-ual partitions and rely on the construction of neighborhood graph.As the dimensionality and scale of da-ta grow,indexing struggles to accurately judge boundary data of partitions,leading to increased con-struction time complexity and challenges in scalability.To address the above problems,a learn to index method based on partitioned hierarchical graphs,PBO-HNSW is proposed.The method redistributes partition boundary data and constructs distributed graph index structures in parallel.It effectively ad-dresses the challenges faced by the approximate nearest neighbor search problem.Experimental results show that PBO-HNSW method is able to achieve millisecond index construction on millions of massive high-dimensional data.When the recall is 0.93,the construction time of the PBO-HNSW method is only 36.4%of baseline methods.
approximate nearest neighbor searchlearning to indexhierarchical navigable small world(HNSW)partition learningindex structure