首页|面向天河新一代超算系统通用处理器的性能分析工具集

面向天河新一代超算系统通用处理器的性能分析工具集

扫码查看
天河新一代超算系统是继天河2号后天河系列的新一代超算系统.该系统拟采用通用处理器配合加速器的混合异构架构,其中通用处理器采用ARM架构.目前,面向ARM架构处理器的性能分析工具仍不够完善,而面向新一代超算的性能分析工具更是较为匮乏,实用性和效率还难以满足编程人员的需求.本文针对天河新一代超算系统的通用处理器,设计开发了一套性能分析工具集,包含缓存冲突检测、伪共享检测和内存缺陷检测三个子工具.工具集可以在天河新一代超算系统的普通用户权限下分析系统单节点内以及数据并行性较高的多节点程序的性能问题,并可以解决程序的内存问题.本文使用min-write、缓存行对齐填充、线程访问隔离等多种性能优化策略来提高工具性能,采用以上策略的工具的运行时间可至多减少至原先的1/20,同时使用新颖的红区检测法和红区隐藏与恢复机制来降低工具报告的假错误率.本文还开发了配套的可视化界面,使用户可以对程序的性能分析数据进行可视化的分析和处理,提高了工具的实用性和易用性.工具对程序执行带来的额外时间开销是40~100倍,额外内存开销是100~200倍,正确性和实用性得以保证,可以提高天河新一代超算系统的编程效率和程序性能.
A Set of Performance Profiling Tools for the General Purpose Processors of TianHe New Generation Supercomputing System
TianHe new generation supercomputer system is a new generation of supercomputer system in the TianHe series after TianHe-2.The system is expected to adopt a hybrid heterogeneous architecture of general processor and accelerator,in which the general purpose processor adopts ARM architecture.At present,performance profiling tools for ARM architecture are still not perfect,and those for new generation supercomputers are even more scarce,and their practicability and efficiency are still difficult to meet the needs of programmers.For the general purpose processors of TianHe new generation of supercomputing system,this paper designs and develops a set of performance profiling tools,which contain cache conflict detection,false sharing detection and memory defect detection.The tool set can analyze the performance problems of the system's single node and the multi-node programs with high data parallelism,and solve the memory problems of the programs under the authority of ordinary users of TianHe new generation supercomputer system.Specially,the performance problems mentioned in this paper are mainly about the cache,which is always invisible to the programmers.This fact leads our work to great significance because the performance problems caused by cache are hard to disclose by programmers themselves only checking their codes.The memory defect detection tool proposed by this paper is able to detect five sub-problems including accessing invalid/illegal address space,use-after-free problem,read uninitialized space,double-free problem and memory leak problem.In this paper,a variety of performance optimization strategies such as min-write,cache line alignment fill,and thread access isolation are used to improve the tool performance,which can achieve 1.2 to 20 times faster than the unoptimized tool.Meanwhile,the novel red-zone detection method and red-zone hiding and recovery mechanism are used to reduce the false error rate reported by the tool.The red zone detection method is to set the red zone at the end of the memory allocation space to detect memory access errors.The design idea of this method comes from the summary of the common pattern that programmers write code,usually array bounds are concentrated in the array boundary.The purpose of the red zone hide and recover mechanism is to avoid false errors during continuous memory allocation and further reduce the false error rate generated by the tool.This paper also developed a supporting visual interface,users can perform visual analysis and processing of the program performance analysis data,improving the utility and usability of the tool.In the experiment,we use our tool-set to find a severe cache contention phenomenon in OCEAN-ncp in SPLASH-3,a famous parallel benchmark suite,which reveals a huge hidden optimizing opportunity.Later,we use the false-sharing detection tool to pinpoint the exact context also the line numbers in source code where the false-sharing happens and incurs great performance degradation.By gathering these information together and thoroughly exploiting this opportunity,we achieve a 3x speedup of parallel program OCEAN-ncp.The tools'time cost and space cost are about 40~100x and 100~200x.The tools have moderate overhead,correctness and practicability,which can improve the programming efficiency and program performance of TianHe new generation supercomputer system.

performance profiling toolsTianHe new generation supercomputing systemfalse sharing detectionmemory defect detectionprogram optimization

冯文韬、栾钟治、杨海龙、钱德沛

展开 >

北京航空航天大学计算机学院 北京 100191

性能分析工具 天河新一代超算系统 伪共享检测 内存缺陷检测 程序优化

国家自然科学基金面上项目

62072018

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(2)
  • 19