首页|基于国产DCU异构平台的图匹配算法移植与优化

基于国产DCU异构平台的图匹配算法移植与优化

扫码查看
子图匹配是一种基础的图算法,被广泛应用于社交网络、图神经网络等众多领域.随着图数据规模的增长,人们迫切需要高效的子图匹配算法.GENEVA是一种基于GPU的并行子图匹配算法,其利用区间索引的图存储结构和并行匹配优化方法,能够大幅度减少存储开销,提升子图匹配性能.但由于平台底层硬件架构和编译环境的不同,GENEVA无法直接应用到国产DCU异构平台.为了解决该问题,提出了 GENEVA面向国产DCU的移植和优化方案.IO时间开销是GENEVA算法主要的性能瓶颈,文中采用锁页内存、预加载、调度器3种优化策略来突破该瓶颈.其中,锁页内存技术避免了从可分页内存到临时锁页内存的额外数据传输,在DCU平台上大幅度减少了 IO传输的时间开销;预加载技术将IO数据传输与DCU核函数计算重叠,掩盖了 IO时间开销;调度器在满足预加载需求的同时,减少了冗余数据的传输.在3个不同规模的真实数据集上进行实验,结果表明,采用优化策略后算法性能显著提高.在92.6%的测试用例上,经过优化的GENEVA-HIP算法在国产DCU平台的执行时间比移植前的GENEVA算法在GPU服务器的执行时间短.在较大规模的数据集上,优化的GENEVA-HIP算法在DCU平台上的执行时间相比移植前的GENEVA算法在GPU服务器的执行时间减少了 52.73%.
Transplantation and Optimization of Graph Matching Algorithm Based on Domestic DCU Heterogeneous Platform
Subgraph matching is a basic graph algorithm that is widely used in various fields such as social networks and graph neural networks.As the scale of graph data grows,there is an increasing need for efficient subgraph matching algorithms.GENE-VA is a GPU-based parallel subgraph matching algorithm.It uses the interval index graph storage structure and parallel matching optimization method to greatly reduce storage overhead and improve subgraph matching performance.However,due to the diffe-rence in the underlying hardware architecture and compilation environment of the platform,GENEVA cannot be directly applied to the domestic DCU platform.In order to solve this problem,this paper proposes GENEVA's transplantation and optimization scheme for domestic DCU.IO time consumption is the main performance bottleneck of GENEVA algorithm.This paper proposes three optimization strategies of page-locked memory,preloading,and scheduler to alleviate this bottleneck.Among them,page-locked memory technology avoids additional data transfer from pageable memory to temporary page-locked memory,and greatly reduces the time consumption of IO transfer on the DCU platform.The preloading technology overlaps IO data transmission with DCU kernel function computation to mask IO time consumption.The scheduler reduces redundant data transfer while satisfy pre-loading requirements.In this paper,Experiments are carried out on three real-wo rld datasets of different sizes,and the results show that the algorithm performance is significantly improved after using the optimization strategies.On 92.6%of the test ca-ses,the optimized GENEVA-HIP execution time on the Sugon DCU platform is less than that of the unported GENEVA on the GPU server.On a larger dataset,the execution time of the optimized Geneva-HIP algorithm on the DCU platform is reduced by 52.73%compared with the the pre-port GENEVA algorithm on the GPU server.

Subgraph matchingDCUHeterogeneous platformHIPTransplantation and optimization

郝萌、田雪洋、鲁刚钊、刘义、张伟哲、何慧

展开 >

哈尔滨工业大学计算学部 哈尔滨 150001

中国电子科技南湖研究院 浙江嘉兴 314001

中国科学院大学计算机科学与技术学院 北京 100049

子图匹配 DCU 异构平台 HIP 移植和优化

国家自然科学基金青年基金国家自然科学基金联合重点基金国家重点研发计划广东省重点研发计划光合基金(C类)

62202123U22A20362020YFB14069022020B0101360001202202033706

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(4)
  • 27