一种基于TVM的算子生成加速策略

扫码查看

原文链接

万方数据
维普

中文摘要：随着人工智能(AI)的飞速发展,新算子和底层硬件层出不穷,这给算子库的开发和维护带来了巨大的工作量.单纯基于手工优化来解决AI模型的性能和效率很容易出现瓶颈.TVM深度学习编译器通过代码的自动化生成减轻了手工优化的负担,但同时也存在搜索时间长的问题.为此,针对TVM的自动化代码生成框架Ansor,提出基于梯度提升算法的新代价模型和基于预定义规则的调度空间剪枝优化2种优化策略,旨在加速TVM的自动化代码生成过程,实现模型快速落地与部署,并进一步为人工智能技术的应用提供更高效的解决方案.实验结果表明,通过应用优化后代价模型可以在不损失推理时间的前提下,使得在x86 CPU平台上模型的调优时间减少30％～35％,同时优化后算子性能最高可提升22％,使得在深度计算单元(DCU)平台上模型的调优时间减少20％左右,同时优化后算子平均性能提升5.7％,此外,基于预定义规则的剪枝策略可以有效提升代价模型的收敛速度,并且在原有最佳迭代次数下,模型推理时间可提高7.4％.

外文标题：One Acceleration Strategy for Operator Generation Based on TVM

外文摘要：With the rapid development of Artificial Intelligence(AI),the continuous emergence of new operators and underlying hardware has increased the workload associated with the development and maintenance of operator libraries.Relying solely on manual optimization to improve the performance and efficiency of AI models can result in bottlenecks.The TVM deep learning compiler alleviates the burden of manual optimization through automated code generation.However,it also suffers from long search times.To address this issue,this study proposes two optimization strategies for Ansor,an automated code generation framework for TVM.The first strategy introduces a new cost model based on a gradient boosting algorithm,whereas the second strategy involves pruning the scheduling space based on predefined rules.The two optimization strategies aim to accelerate the automated code generation process of TVM,enabling quick deployment and implementation of models and providing more efficient solutions for the application of AI technology.The experimental results show that by applying the optimized cost model,the tuning time of the model on the x86 CPU platform can be reduced by 30％to 35％without losing inference time.Simultaneously,the performance of the optimized operator can be improved by up to 22％,thereby reducing the tuning time of the model on the Deep Computing Unit(DCU)platform by approximately 20％.Simultaneously,the average performance of the optimized operator can be improved by 5.7％.In addition,a pruning strategy based on predefined rules can effectively improve the convergence speed of the cost model,and the inference time of the model can be increased by 7.4％under the original optimal number of iterations.

外文关键词：

deep learning compilercost modelgradient boosting algorithmpruning strategyautomatic tuning

作者：

高伟、李帅龙、茆琳、王磊、李颖颖、韩林

展开 >

作者单位：

郑州大学国家超级计算郑州中心,河南郑州 450001

郑州大学计算机与人工智能学院,河南郑州 450001

92196部队19分队,山东青岛 266000

信息工程大学网络空间安全学院,河南郑州 450001

展开 >

关键词：

深度学习编译器代价模型梯度提升算法剪枝策略自动调优

基金：

河南省重大科技专项

项目编号：

221100210600

出版年：

2024

DOI：

10.19678/j.issn.1000-3428.0068182

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCD北大核心

影响因子：0.581

ISSN：1000-3428

年,卷(期)：2024.50(8)

参考文献量4