计算机工程与科学2024,Vol.46Issue(7) :1193-1201.DOI:10.3969/j.issn.1007-130X.2024.07.007

基于分区层次图的海量高维数据学习索引构建方法

Learning indexing method for massive high-dimensional data based on partitioned hierarchical graph

华悦琳 周晓磊 范强 王芳潇 严浩
计算机工程与科学2024,Vol.46Issue(7) :1193-1201.DOI:10.3969/j.issn.1007-130X.2024.07.007

基于分区层次图的海量高维数据学习索引构建方法

Learning indexing method for massive high-dimensional data based on partitioned hierarchical graph

华悦琳 1周晓磊 2范强 2王芳潇 2严浩2
扫码查看

作者信息

  • 1. 南京信息工程大学计算机学院、网络空间安全学院,江苏 南京 210044;国防科技大学第六十三研究所,江苏 南京 210007;国防科技大学大数据与决策实验室,湖南 长沙 410073
  • 2. 国防科技大学第六十三研究所,江苏 南京 210007;国防科技大学大数据与决策实验室,湖南 长沙 410073
  • 折叠

摘要

学习索引是破解海量高维数据近似最近邻搜索问题的关键.然而,现有学习索引技术结果仅局限于单个分区中,且依赖于近邻图的构建.随着数据维度和规模的增长,索引难以对分区边界数据进行精确判断,并且构建时间复杂度增大,可扩展性难以保障.针对上述问题,提出了基于分区层次图的学习索引方法PBO-HNSW.该方法对分区边界数据进行重新分配,并行构建分布式图索引结构,从而有效应对近似最近邻搜索问题所面临的挑战.实验结果表明,该方法能够在百万级海量高维数据上实现毫秒级的索引构建.当召回率为0.93时,PBO-HNSW方法构建时间仅为基线方法的36.4%.

Abstract

Learning to index is the key to solving the problem of approximate nearest neighbor search in massive high-dimensional data.However,existing learning to index techniques are limited to individ-ual partitions and rely on the construction of neighborhood graph.As the dimensionality and scale of da-ta grow,indexing struggles to accurately judge boundary data of partitions,leading to increased con-struction time complexity and challenges in scalability.To address the above problems,a learn to index method based on partitioned hierarchical graphs,PBO-HNSW is proposed.The method redistributes partition boundary data and constructs distributed graph index structures in parallel.It effectively ad-dresses the challenges faced by the approximate nearest neighbor search problem.Experimental results show that PBO-HNSW method is able to achieve millisecond index construction on millions of massive high-dimensional data.When the recall is 0.93,the construction time of the PBO-HNSW method is only 36.4%of baseline methods.

关键词

近似最近邻搜索/学习索引/层次可导航小世界图/分区学习/索引结构

Key words

approximate nearest neighbor search/learning to index/hierarchical navigable small world(HNSW)/partition learning/index structure

引用本文复制引用

出版年

2024
计算机工程与科学
国防科学技术大学计算机学院

计算机工程与科学

CSTPCD北大核心
影响因子:0.787
ISSN:1007-130X
段落导航相关论文