首页|基于容忍因子的近似最近邻混合查询算法

基于容忍因子的近似最近邻混合查询算法

扫码查看
近似最近邻搜索(ANNS)是计算机领域中一种重要的高效相似度搜索技术,可用于在大规模数据集中进行快速信息检索.随着人们对高精度信息检索的需求不断增长,同时使用结构化信息和非结构化信息进行混合查询的方式也得到了广泛应用.然而,基于近邻图的过滤贪心算法在混合查询时可能会因结构化约束条件的影响导致连通性降低,进而损害搜索精度.为此,提出了一种基于容忍因子的过滤贪心算法,通过容忍因子控制不满足结构化约束条件的顶点参与路由,在不改变索引结构的前提下维持原有近邻图的连通性,克服了结构化约束条件对检索精度的负面影响.实验结果证明,新算法可以在不同结构化约束强度下实现ANNS的高精度搜索,同时保持检索效率.该研究解决了基于近邻图的ANNS在混合查询场景中的问题,为大规模数据集的快速混合查询信息检索提供了一种有效的解决方案.
Approximate nearest neighbor hybrid query algorithm based on tolerance factor
Approximate nearest neighbor search(ANNS)is an important technique in the field of computer science for efficient similarity search,enabling fast information retrieval in large-scale datasets.With the increasing demand for high-precision information retrieval,there is a growing use of the hybrid query method that utilizes both structured and unstructured information for querying,which has broad application prospects.However,filtered greedy algorithms based on nearest neighbor graphs may reduce the connectivity of the graph due to the impact of structural constraints in hybrid queries,ultimately damaging search accuracy.This article proposes a tolerance factor based filtered greedy algorithm,which controls the participation of vertices that do not meet structural constraints in routing through a tolerance factor,maintaining the connectivity of the original nearest neighbor graph without changing the index structure,and overcoming the negative impact of structural constraints on retrieval accuracy.The experimental results demonstrate that the new method can achieve high-precision search for ANNS under different levels of structural constraints,while maintaining retrieval efficiency.This study solves the problem of ANNS based on nearest neighbor graphs in hybrid query scenarios,providing an effective solution for fast hybrid query information retrieval in large-scale datasets.

hybrid queryvector searchnearest neighbor searchfiltered search

贺广福、薛源海、陈翠婷、俞晓明、刘欣然、程学旗

展开 >

中国科学院计算技术研究所,北京 100190

中国科学院大学,北京 101408

北京邮电大学,北京 100876

混合查询 向量检索 最近邻搜索 过滤搜索

国家自然科学基金项目

U21B2046

2024

大数据
人民邮电出版社

大数据

CSTPCD
ISSN:2096-0271
年,卷(期):2024.10(1)
  • 30