基于国产DCU异构平台的图匹配算法移植与优化
Transplantation and Optimization of Graph Matching Algorithm Based on Domestic DCU Heterogeneous Platform
郝萌 1田雪洋 1鲁刚钊 2刘义 3张伟哲 1何慧1
作者信息
- 1. 哈尔滨工业大学计算学部 哈尔滨 150001
- 2. 中国电子科技南湖研究院 浙江嘉兴 314001
- 3. 中国科学院大学计算机科学与技术学院 北京 100049
- 折叠
摘要
子图匹配是一种基础的图算法,被广泛应用于社交网络、图神经网络等众多领域.随着图数据规模的增长,人们迫切需要高效的子图匹配算法.GENEVA是一种基于GPU的并行子图匹配算法,其利用区间索引的图存储结构和并行匹配优化方法,能够大幅度减少存储开销,提升子图匹配性能.但由于平台底层硬件架构和编译环境的不同,GENEVA无法直接应用到国产DCU异构平台.为了解决该问题,提出了 GENEVA面向国产DCU的移植和优化方案.IO时间开销是GENEVA算法主要的性能瓶颈,文中采用锁页内存、预加载、调度器3种优化策略来突破该瓶颈.其中,锁页内存技术避免了从可分页内存到临时锁页内存的额外数据传输,在DCU平台上大幅度减少了 IO传输的时间开销;预加载技术将IO数据传输与DCU核函数计算重叠,掩盖了 IO时间开销;调度器在满足预加载需求的同时,减少了冗余数据的传输.在3个不同规模的真实数据集上进行实验,结果表明,采用优化策略后算法性能显著提高.在92.6%的测试用例上,经过优化的GENEVA-HIP算法在国产DCU平台的执行时间比移植前的GENEVA算法在GPU服务器的执行时间短.在较大规模的数据集上,优化的GENEVA-HIP算法在DCU平台上的执行时间相比移植前的GENEVA算法在GPU服务器的执行时间减少了 52.73%.
Abstract
Subgraph matching is a basic graph algorithm that is widely used in various fields such as social networks and graph neural networks.As the scale of graph data grows,there is an increasing need for efficient subgraph matching algorithms.GENE-VA is a GPU-based parallel subgraph matching algorithm.It uses the interval index graph storage structure and parallel matching optimization method to greatly reduce storage overhead and improve subgraph matching performance.However,due to the diffe-rence in the underlying hardware architecture and compilation environment of the platform,GENEVA cannot be directly applied to the domestic DCU platform.In order to solve this problem,this paper proposes GENEVA's transplantation and optimization scheme for domestic DCU.IO time consumption is the main performance bottleneck of GENEVA algorithm.This paper proposes three optimization strategies of page-locked memory,preloading,and scheduler to alleviate this bottleneck.Among them,page-locked memory technology avoids additional data transfer from pageable memory to temporary page-locked memory,and greatly reduces the time consumption of IO transfer on the DCU platform.The preloading technology overlaps IO data transmission with DCU kernel function computation to mask IO time consumption.The scheduler reduces redundant data transfer while satisfy pre-loading requirements.In this paper,Experiments are carried out on three real-wo rld datasets of different sizes,and the results show that the algorithm performance is significantly improved after using the optimization strategies.On 92.6%of the test ca-ses,the optimized GENEVA-HIP execution time on the Sugon DCU platform is less than that of the unported GENEVA on the GPU server.On a larger dataset,the execution time of the optimized Geneva-HIP algorithm on the DCU platform is reduced by 52.73%compared with the the pre-port GENEVA algorithm on the GPU server.
关键词
子图匹配/DCU/异构平台/HIP/移植和优化Key words
Subgraph matching/DCU/Heterogeneous platform/HIP/Transplantation and optimization引用本文复制引用
基金项目
国家自然科学基金青年基金(62202123)
国家自然科学基金联合重点基金(U22A2036)
国家重点研发计划(2020YFB1406902)
广东省重点研发计划(2020B0101360001)
光合基金(C类)(202202033706)
出版年
2024