基于倒排索引的正则路径查询算法
Efficient matching algorithm for regular path query based on inverted index
夏秀峰 1孙翔天 1孙尧 2邓国鹏 2朱康 2邱涛1
作者信息
- 1. 沈阳航空航天大学计算机学院,辽宁沈阳 110136
- 2. 沈阳飞机工业(集团)有限公司试飞站,辽宁沈阳 110034
- 折叠
摘要
对于图数据上的正则路径查询(regular path query,RPQ)问题,其使用正则表达式定义图中两个节点之间的约束.针对现有的RPQ在图上遍历匹配方法效率低下这一问题,提出一种基于倒排索引的RPQ算法,在图上构建标签的倒排索引,匹配过程中快速检索标签的相应倒排列表.设计的IRPQ算法将查询转化为面向倒排列表的查询计划树,经过优化以减少冗余列表合并操作.在真实数据集上进行了实验,其结果表明,IRPQ及其优化算法相比现有方法显著提高了查询性能.
Abstract
Aiming the regular path query(RPQ)problem on graph data,in which regular expressions are used to define the con-straints between two nodes in the graph,a RPQ algorithm based on inverted index was proposed to address the low efficiency issue of existing RPQ matching methods on graphs.An inverted index of labels was constructed on the graph,and it was used to quickly obtain posting lists of labels during RPQ processing to complete matching on the inverted lists.The designed IRPQ algo-rithm was used to convert the query into a query plan tree oriented to the inverted list and it was optimized to reduce redundant list merging operations.Experiments on real datasets show that the proposed IRPQ algorithm and its optimization methods sig-nificantly improve query performance compared to existing methods.
关键词
属性图模型/正则路径查询/倒排索引/查询计划树/树结构递归/启发式算法/查询树优化Key words
property graph model/regular path query/inverted index/query plan tree/tree structure recursion/heuristic algo-rithm/query tree optimization引用本文复制引用
基金项目
国家自然科学基金项目(62002245)
科技部国家重点研发计划课题基金项目(2021YFB01)
辽宁省自然科学基金项目(2022-BS-218)
出版年
2024