基于GPU的LBM迁移模块算法优化

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据
维普

中文摘要：格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性.图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算.基于GPU设计LBM的并行算法,能够提高计算效率.但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖.提出一种基于GPU的LBM迁移模块算法优化策略.首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化.对并行算法进行测试,结果显示:该算法在1.3×108规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率.

外文标题：GPU-based Algorithm Optimization for Streaming Module of Lattice Boltzmann Method

外文摘要：The Lattice Boltzmann Method(LBM)is a Computational Fluid Dynamics(CFD)method based on a mesoscopic simulation scale.A large number of discrete lattice points suitable for parallelism are set during the calculation.Several arithmetic logic units in a Graphics Processing Unit(GPU)are suitable for large-scale parallel computing.The design of a GPU-based LBM parallel algorithm can improve the computational efficiency of the algorithm.However,the calculation of each lattice point in the streaming module of the LBM algorithm requires communication with other lattice points that have strong data dependence.In this study,a GPU-based optimization strategy for an LBM streaming module is proposed.First,the implementation logic of the migration part is analyzed in detail,and a three-dimensional model is discretized into several two-dimensional models according to the velocity component through model dimension reduction,which reduces the complexity of the model.Second,the data differences in the lattice points before and after the streaming module calculation are analyzed,the communication rules of the streaming module are determined through data positioning,and the data exchange modes between the lattice points are classified.The discrete two-dimensional model is thereafter divided into regions using a classified exchange mode,and a new data communication mode is designed.Finally,the influence of data dependence is successfully eliminated and the streaming module is completely parallel.The parallel algorithm is tested,and an acceleration ratio of 1.92 times is achieved under 1.3×108 grids,which shows that the algorithm has a good parallel effect.Meanwhile,compared with an algorithm that does not parallelize the streaming module,the optimization strategy in this study can improve the parallel computing efficiency of the algorithm by 30%.

外文关键词：

High Performance Computing(HPC)Lattice Boltzmann Method(LBM)Graphics Processing Unit(GPU)parallel optimizationdata rearrangement

作者：

黄斌、柳安军、潘景山、田敏、张煜、朱光慧

展开 >

作者单位：

齐鲁工业大学(山东省科学院)山东省计算中心(国家超级计算济南中心),山东济南 251013

济南超级计算技术研究院高性能计算实验室,山东济南 251013

哈尔滨工业大学能源科学与工程学院,黑龙江哈尔滨 150001

关键词：

高性能计算格子玻尔兹曼方法图形处理器并行优化数据重排

基金：

国家自然科学基金山东省重点研发计划项目

项目编号：

620021862021RZB01002

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0067084

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(2)

参考文献量2