面向结构动力学计算的撕裂有限元方法异构并行优化

HETEROGENEOUS PARALLEL OPTIMIZATION OF TEARING FINITE ELEMENT METHODS FOR STRUCTURAL DYNAMICS CALCULATIONS

聂宁明 ¹姚柯寒 ²曾艳 ³冯仰德 ²王珏 ²李顺德 ¹张纪林 ³万健 ³林克豪 ³高岳 ⁴王彦棡 ¹王宗国¹

扫码查看

作者信息

1. 中国科学院计算机网络信息中心,北京 100190
2. 中国科学院计算机网络信息中心,北京 100190;杭州电子科技大学计算机学院,杭州 310018
3. 杭州电子科技大学计算机学院,杭州 310018
4. 中国原子能科学研究院,北京 102413
折叠

摘要

本文结合大规模撕裂有限元方法和Newmark积分法,对结构动力学问题进行高精细的大规模并行求解.面向异构平台,设计了结点间和结点内的多级动静结合的负载均衡策略.在结点间,根据撕裂有限元方法划分子域边界特点,采用域边界平衡的图二分算法,均衡各个子域的计算量;在结点内,根据异构平台计算单元的性能差异,进行了计算负载的动态优化.针对核心计算模块批量矩阵向量乘进行多流并行优化,提升面向异构计算平台的利用率.本文优化已经集成到结构力学高性能数值模拟软件HARSA-feti中,实验采用真实反应堆核燃料组件的流致振动仿真作为算例,结果表明模拟性能提高了 71.3％以上,首次实现了百亿网格规模的全堆芯燃料棒组件的高精细模拟,相较于1000块GPU,16000块GPU的强、弱可扩展并行效率分别达到74.1％和81.1％.

Abstract

This paper adopts the Newmark integration method based on the large-scale tearing finite element method to perform high-precision large-scale parallel solving of structural dy-namic calculations.A multi-level load balancing strategy combining static and dynamic methods is designed for heterogeneous platforms.For inter-node computing,subdomain boundaries are partitioned based on the characteristics of the tearing finite element method,and a domain boundary balanced graph bipartition algorithm is used to balance the com-putation load of each subdomain.For intra-node computing,dynamic optimization of com-putation load is performed based on the performance differences of computing units on heterogeneous platforms.To improve the utilization rate of heterogeneous computing plat-forms,multi-stream parallel optimization is carried out for the core computing module's batch matrix-vector multiplication.The optimization in this paper has been integrated into the high-performance numerical simulation software for structural mechanics,HARSA-feti.The simulation performance is demonstrated using the flow-induced vibration simulation of a real reactor fuel component as an example.The results show that the simulation performance has increased by more than 71.3％,and the high-precision simulation of a billion-grid-scale full-core fuel rod component has been achieved for the first time.Compared with 1,000 GPUs,the strong and weak scalable parallel efficiency of 16,000 GPUs reached 74.1％and 81.1％,respectively.

关键词

结构动力学/大规模并行计算/负载均衡/异构计算/矩阵向量乘

Key words

Structural dynamics/Parallel computing on a large scale/Massively paral-lel/Load balancing/Matrix vector multiplication

引用本文复制引用

出版年

2024

数值计算与计算机应用

中国科学院数学与系统科学研究院

数值计算与计算机应用

影响因子：0.188

ISSN：1000-3266

段落导航