首页|基于GPU的LBM迁移模块算法优化

基于GPU的LBM迁移模块算法优化

扫码查看
格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1。3×108规模网格下能达到1。92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。
GPU-based Algorithm Optimization for Streaming Module of Lattice Boltzmann Method
The Lattice Boltzmann Method(LBM)is a Computational Fluid Dynamics(CFD)method based on a mesoscopic simulation scale.A large number of discrete lattice points suitable for parallelism are set during the calculation.Several arithmetic logic units in a Graphics Processing Unit(GPU)are suitable for large-scale parallel computing.The design of a GPU-based LBM parallel algorithm can improve the computational efficiency of the algorithm.However,the calculation of each lattice point in the streaming module of the LBM algorithm requires communication with other lattice points that have strong data dependence.In this study,a GPU-based optimization strategy for an LBM streaming module is proposed.First,the implementation logic of the migration part is analyzed in detail,and a three-dimensional model is discretized into several two-dimensional models according to the velocity component through model dimension reduction,which reduces the complexity of the model.Second,the data differences in the lattice points before and after the streaming module calculation are analyzed,the communication rules of the streaming module are determined through data positioning,and the data exchange modes between the lattice points are classified.The discrete two-dimensional model is thereafter divided into regions using a classified exchange mode,and a new data communication mode is designed.Finally,the influence of data dependence is successfully eliminated and the streaming module is completely parallel.The parallel algorithm is tested,and an acceleration ratio of 1.92 times is achieved under 1.3×108 grids,which shows that the algorithm has a good parallel effect.Meanwhile,compared with an algorithm that does not parallelize the streaming module,the optimization strategy in this study can improve the parallel computing efficiency of the algorithm by 30%.

High Performance Computing(HPC)Lattice Boltzmann Method(LBM)Graphics Processing Unit(GPU)parallel optimizationdata rearrangement

黄斌、柳安军、潘景山、田敏、张煜、朱光慧

展开 >

齐鲁工业大学(山东省科学院)山东省计算中心(国家超级计算济南中心),山东 济南 251013

济南超级计算技术研究院高性能计算实验室,山东 济南 251013

哈尔滨工业大学能源科学与工程学院,黑龙江 哈尔滨 150001

高性能计算 格子玻尔兹曼方法 图形处理器 并行优化 数据重排

国家自然科学基金山东省重点研发计划项目

620021862021RZB01002

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(2)
  • 2