面向结构动力学计算的撕裂有限元方法异构并行优化
HETEROGENEOUS PARALLEL OPTIMIZATION OF TEARING FINITE ELEMENT METHODS FOR STRUCTURAL DYNAMICS CALCULATIONS
聂宁明 1姚柯寒 2曾艳 3冯仰德 2王珏 2李顺德 1张纪林 3万健 3林克豪 3高岳 4王彦棡 1王宗国1
作者信息
- 1. 中国科学院计算机网络信息中心,北京 100190
- 2. 中国科学院计算机网络信息中心,北京 100190;杭州电子科技大学计算机学院,杭州 310018
- 3. 杭州电子科技大学计算机学院,杭州 310018
- 4. 中国原子能科学研究院,北京 102413
- 折叠
摘要
本文结合大规模撕裂有限元方法和Newmark积分法,对结构动力学问题进行高精细的大规模并行求解.面向异构平台,设计了结点间和结点内的多级动静结合的负载均衡策略.在结点间,根据撕裂有限元方法划分子域边界特点,采用域边界平衡的图二分算法,均衡各个子域的计算量;在结点内,根据异构平台计算单元的性能差异,进行了计算负载的动态优化.针对核心计算模块批量矩阵向量乘进行多流并行优化,提升面向异构计算平台的利用率.本文优化已经集成到结构力学高性能数值模拟软件HARSA-feti中,实验采用真实反应堆核燃料组件的流致振动仿真作为算例,结果表明模拟性能提高了 71.3%以上,首次实现了百亿网格规模的全堆芯燃料棒组件的高精细模拟,相较于1000块GPU,16000块GPU的强、弱可扩展并行效率分别达到74.1%和81.1%.
Abstract
This paper adopts the Newmark integration method based on the large-scale tearing finite element method to perform high-precision large-scale parallel solving of structural dy-namic calculations.A multi-level load balancing strategy combining static and dynamic methods is designed for heterogeneous platforms.For inter-node computing,subdomain boundaries are partitioned based on the characteristics of the tearing finite element method,and a domain boundary balanced graph bipartition algorithm is used to balance the com-putation load of each subdomain.For intra-node computing,dynamic optimization of com-putation load is performed based on the performance differences of computing units on heterogeneous platforms.To improve the utilization rate of heterogeneous computing plat-forms,multi-stream parallel optimization is carried out for the core computing module's batch matrix-vector multiplication.The optimization in this paper has been integrated into the high-performance numerical simulation software for structural mechanics,HARSA-feti.The simulation performance is demonstrated using the flow-induced vibration simulation of a real reactor fuel component as an example.The results show that the simulation performance has increased by more than 71.3%,and the high-precision simulation of a billion-grid-scale full-core fuel rod component has been achieved for the first time.Compared with 1,000 GPUs,the strong and weak scalable parallel efficiency of 16,000 GPUs reached 74.1%and 81.1%,respectively.
关键词
结构动力学/大规模并行计算/负载均衡/异构计算/矩阵向量乘Key words
Structural dynamics/Parallel computing on a large scale/Massively paral-lel/Load balancing/Matrix vector multiplication引用本文复制引用
出版年
2024