计算机工程2024,Vol.50Issue(8) :207-215.DOI:10.19678/j.issn.1000-3428.0067530

基于多绘制管线的大规模并行体绘制性能优化技术

Performance Optimization Technique for Large-Scale Parallel Volume Rendering Based on Multiple Rendering Pipelines

王华维 刘若妍 艾志玮 曹轶
计算机工程2024,Vol.50Issue(8) :207-215.DOI:10.19678/j.issn.1000-3428.0067530

基于多绘制管线的大规模并行体绘制性能优化技术

Performance Optimization Technique for Large-Scale Parallel Volume Rendering Based on Multiple Rendering Pipelines

王华维 1刘若妍 2艾志玮 1曹轶1
扫码查看

作者信息

  • 1. 北京应用物理与计算数学研究所计算物理重点实验室,北京 100088;中物院高性能数值模拟软件中心,北京 100088
  • 2. 北京应用物理与计算数学研究所计算物理重点实验室,北京 100088
  • 折叠

摘要

针对数值模拟输出的大规模科学数据,体绘制方法为了刻画复杂物理特征,会进行高密度光线采样,但由此带来了极大的计算开销和数据增量.在国产自主CPU高性能计算机上,由于处理器单核的计算能力低于商业CPU,只能使用更多的处理器核来分担体绘制任务,从而引起了采样数据并行通信的可扩展性瓶颈.为充分利用国产自主CPU高性能计算机来高效完成体绘制任务,针对大规模并行体绘制提出一种基于多绘制管线的性能优化技术,通过多管线、多进程的两级并行模式来降低单条管线的并行规模.在大规模并行体绘制中,该技术将绘制目标图像划分成多个子区域,绘制进程则相应分组,每个进程组独立执行一条绘制管线,以完成图像相应子区域的绘制,最后再收集所有的图像子区域,形成完整图像并输出.实验结果表明,优化后的体绘制算法在国产自主CPU高性能计算机上可以扩展到万核规模,并能有效完成体绘制任务.

Abstract

For large-scale scientific data output in numerical simulations,volume rendering methods inevitably perform high-density ray sampling to capture complex physical features,resulting in significant computational overhead and data increment.However,on domestic autonomous-CPU supercomputers,owing to the lower computing power of a single processor core compared to that of commercial CPU,more processor cores must be used to share volume rendering tasks;this leads to scalability bottlenecks in the parallel communication of sampling data.Full utilization of domestic autonomous-CPU supercomputers to efficiently complete volume rendering tasks is an urgent problem that needs to be solved.To address this problem,this paper proposes a performance optimization technique for large-scale parallel volume rendering based on multiple rendering pipelines;here,the parallel scale of a rendering pipeline is reduced by two-level parallelism:first,at the pipeline level,and then,at the process level.In large-scale parallel volume rendering after optimization,the rendered goal image is first divided into multiple sub-regions,and all rendering processes are grouped accordingly.Each process group then executes a rendering pipeline independently,and as a result,the corresponding sub-region of the image is produced.Finally,all sub-regions of the image are collected,and the whole image is output.Experiments demonstrate that the optimized volume rendering algorithm can scale to approximately 10 000 processing cores on domestic autonomous-CPU supercomputers and can effectively complete volume rendering tasks.

关键词

体绘制/多管线/两级并行/并行可扩展性/性能优化

Key words

volume rendering/multiple pipelines/two-level parallelism/parallel scalability/performance optimization

引用本文复制引用

基金项目

国家重点研发计划(2017YFB0202203)

出版年

2024
计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
参考文献量3
段落导航相关论文