首页|面向FT-M6678的对称矩阵特征值求解算法实现与优化

面向FT-M6678的对称矩阵特征值求解算法实现与优化

扫码查看
目前国产自主可控FT-M6678平台上没有对称矩阵特征值求解相关的实现,且平台上现有数学计算库不能很好地满足类似问题求解的需求。面向国产FT-M6678处理器,对对称矩阵特征值求解(SYEV)算法进行实现与优化,完善FT-M6678平台的线性代数计算库。通过对SYEV算法的实现过程以及运行热点的分析,基于FT-M6678平台进行编译优化、访存优化以及向量并行化优化,其中:编译优化是根据不同的编译选项指导编译器对程序优化以达到加速效果;访存优化包括缓存优化以及数据段与程序段的分配优化,用于提高矩阵数据的访存效率;向量并行化优化包括循环展开以及适配FT-M6678平台的单指令多数据流(SIMD)指令并行优化,用于提升程序的计算效率。在FT-M6678平台上对所实现并优化的算法进行正确性验证与优化性能分析,结果表明,算法能够正确通过LAPACK官方测试集测试,并且在FT-M6678平台上的加速效果可达到58。346倍,对比TMS320C6678平台速度可提升2。053倍。
Algorithm Implementation and Optimization of Symmetric Matrix Eigenvalue Solution for FT-M6678
Currently,there is no implementation related to the symmetric matrix eigenvalue solution on China's autonomous and controllable FT-M6678 platform,and the existing mathematical calculation library on this platform cannot satisfy the requirements for solving similar problems.This study focuses on the domestic FT-M6678 processor,implements and optimizes the algorithm of the symmetric matrix eigenvalue solution,SYEV,and improves the linear algebra calculation library of the FT-M6678 platform.First,by analyzing the implementation process and running hotspots of the SYEV algorithm,compile,memory access,and vector parallel optimizations are performed based on the FT-M6678 platform.Compilation optimization refers to guiding the compiler to optimize programs based on different compilation options to achieve acceleration effects;memory access optimization includes cache optimization and allocation optimization of data and program segments,accelerating the efficiency of matrix data access;and vector parallelization optimization includes loop unrolling and Single Instruction Multiple Data(SIMD)instruction parallel optimization adapted to the FT-M6678 platform,which improves the computational efficiency of programs.Verification and performance tests of the implemented and optimized algorithms are performed using the FT-M6678 platform.The accuracy of the algorithms passes the test of official Linear Algebra PACKage(LAPACK)test set,and the optimization acceleration effect of the algorithm on the FT-M6678 platform can reach 58.346 times,which can improve the speed by 2.053 times compared with the TMS320C6678 platform.

symmetric matrix eigenvalueFT-M6678 platformhotspot analysiscache optimizationvector parallelism

于立、韩林、罗有才、商建东

展开 >

郑州大学计算机与人工智能学院,河南 郑州 450001

国家超级计算郑州中心,河南 郑州 450001

对称矩阵特征值 FT-M6678平台 热点分析 缓存优化 向量并行

河南省重大科技专项

221100210600

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(2)
  • 6