基于鲲鹏处理器的LU并行分解优化算法

扫码查看

原文链接

万方数据
维普

中文摘要：ScaLAPACK(Scalable Linear Algebra PACKage)是并行计算软件包,适用于分布式存储的 MIMD(Multiple Instruc-tion,Multiple Data)并行计算机,被广泛应用于基于线性代数运算的并行应用程序开发.然而在进行LU分解过程中,ScaLA-PACK库中的例程并不是通信最优的,没有充分利用当前的并行架构.针对上述问题,提出一种基于鲲鹏处理器的LU并行分解优化算法(Parallel LU Factorization,PLF),实现了负载均衡,适配国产鲲鹏环境.PLF对不同进程的不同分区的数据进行差异化处理,并将每个进程所拥有的部分数据分配给根进程进行计算,之后再由根进程散播回各个子进程,这有利于充分利用CPU资源,实现负载均衡.在单节点Intel 9320R处理器以及鲲鹏(Kunpeng)920处理器环境中进行测试,其中,Intel平台下使用Intel MKL(Math Kernel Library),Kunpeng平台下使用PLF算法.对比两个平台关于不同规模的方程组求解的性能发现,Kunpeng平台的求解性能有显著优势.在NUMA数进程和单线程的情况下,优化后的计算效率在小规模平均达到4.35％,相比Intel的1.38％提升了 215％;中规模平均达到4.24％,相比Intel平台的1.86％提升了 118％;大规模平均达到4.24％,相比Intel 的 1.99％提升了 113％.

外文标题：LU Parallel Decomposition Optimization Algorithm Based on Kunpeng Processor

外文摘要：Scalable linear algebra PACKage(ScaLAPACK)is a parallel computing package suitable for MIMD(multiple instruc-tion,multiple data)parallel computers with distributed storage.It is widely used in parallel application program development based on linear algebra operation.However,during the LU decomposition process,the routines in the ScaLAPACK library are not communication optimal and do not take full advantage of the current parallel architecture.To solve the above problems,a parallel LU factorization optimization algorithm(PLF)based on Kunpeng processor is proposed to achieve load balancing and adapt to do-mestic Kunpeng environment.PLF processes the data of different partitions of different processes differently.PLF allocates part of the data of each process to the root process for calculation.After the calculation is completed,the root process spreads the data back to each sub-process,which helps to fully utilize CPU resources and achieve load balancing.Tests are performed on single-node Intel 9320R processors and Kunpeng 920 processors.Intel MKL(Math Kernel Library)is used on the Intel platform,and PLF algorithm is used on the Kunpeng platform.After comparing the performance of solving equations of different scales on two platforms,it is found that the performance of solving equations on Kunpeng platform has a significant advantage compared with Intel platform.In the case of NUMA process and single thread,the optimized computing efficiency reaches 4.35％on a small scale on average,which is 215％higher than Intel's 1.38％.The average size of the medium scale reaches 4.24％,compared with 1.86％of Intel platform,an increase of 118％.The large-scale average reaches 4.24％,compared to Intel's 1.99％,an increase of 113％.

外文关键词：

ScaLAPACKLU factorizationParallel computingMKL

作者：

徐鹤、周涛、李鹏、秦芳芳、季一木

展开 >

作者单位：

南京邮电大学计算机学院、软件学院、网络空间安全学院南京 210023

江苏省高性能计算与智能处理工程研究中心南京 210023

南京邮电大学理学院南京 210023

关键词：

ScaLAPACK LU分解并行计算 MKL

基金：

国家自然科学基金国家自然科学基金江苏省六大人才高峰高层次人才项目江苏省研究生实践创新计划江苏省研究生实践创新计划华为鲲鹏众智计划华为鲲鹏众智计划

项目编号：

6210219462102196RJFW-111SJCX22_0267SJCX22_02752022外2412022外243

出版年：

2024

DOI：

10.11896/jsjkx.230900079

计算机科学

重庆西南信息有限公司（原科技部西南信息中心）

计算机科学

CSTPCD北大核心

影响因子：0.944

ISSN：1002-137X

年,卷(期)：2024.51(9)