大数据2024,Vol.10Issue(1) :17-34.DOI:10.11959/j.issn.2096-0271.2024010

基于容忍因子的近似最近邻混合查询算法

Approximate nearest neighbor hybrid query algorithm based on tolerance factor

贺广福 薛源海 陈翠婷 俞晓明 刘欣然 程学旗
大数据2024,Vol.10Issue(1) :17-34.DOI:10.11959/j.issn.2096-0271.2024010

基于容忍因子的近似最近邻混合查询算法

Approximate nearest neighbor hybrid query algorithm based on tolerance factor

贺广福 1薛源海 2陈翠婷 1俞晓明 1刘欣然 3程学旗1
扫码查看

作者信息

  • 1. 中国科学院计算技术研究所,北京 100190;中国科学院大学,北京 101408
  • 2. 中国科学院计算技术研究所,北京 100190
  • 3. 中国科学院计算技术研究所,北京 100190;北京邮电大学,北京 100876
  • 折叠

摘要

近似最近邻搜索(ANNS)是计算机领域中一种重要的高效相似度搜索技术,可用于在大规模数据集中进行快速信息检索.随着人们对高精度信息检索的需求不断增长,同时使用结构化信息和非结构化信息进行混合查询的方式也得到了广泛应用.然而,基于近邻图的过滤贪心算法在混合查询时可能会因结构化约束条件的影响导致连通性降低,进而损害搜索精度.为此,提出了一种基于容忍因子的过滤贪心算法,通过容忍因子控制不满足结构化约束条件的顶点参与路由,在不改变索引结构的前提下维持原有近邻图的连通性,克服了结构化约束条件对检索精度的负面影响.实验结果证明,新算法可以在不同结构化约束强度下实现ANNS的高精度搜索,同时保持检索效率.该研究解决了基于近邻图的ANNS在混合查询场景中的问题,为大规模数据集的快速混合查询信息检索提供了一种有效的解决方案.

Abstract

Approximate nearest neighbor search(ANNS)is an important technique in the field of computer science for efficient similarity search,enabling fast information retrieval in large-scale datasets.With the increasing demand for high-precision information retrieval,there is a growing use of the hybrid query method that utilizes both structured and unstructured information for querying,which has broad application prospects.However,filtered greedy algorithms based on nearest neighbor graphs may reduce the connectivity of the graph due to the impact of structural constraints in hybrid queries,ultimately damaging search accuracy.This article proposes a tolerance factor based filtered greedy algorithm,which controls the participation of vertices that do not meet structural constraints in routing through a tolerance factor,maintaining the connectivity of the original nearest neighbor graph without changing the index structure,and overcoming the negative impact of structural constraints on retrieval accuracy.The experimental results demonstrate that the new method can achieve high-precision search for ANNS under different levels of structural constraints,while maintaining retrieval efficiency.This study solves the problem of ANNS based on nearest neighbor graphs in hybrid query scenarios,providing an effective solution for fast hybrid query information retrieval in large-scale datasets.

关键词

混合查询/向量检索/最近邻搜索/过滤搜索

Key words

hybrid query/vector search/nearest neighbor search/filtered search

引用本文复制引用

基金项目

国家自然科学基金项目(U21B2046)

出版年

2024
大数据
人民邮电出版社

大数据

CSTPCD
ISSN:2096-0271
参考文献量30
段落导航相关论文