基于飞腾D2000的GEMM算法设计与优化实现技术

扫码查看

原文链接

万方数据
维普

中文摘要：在深度学习推理框架中,GEMM是典型的计算密集型算子,在Bert、Transformer、Yolo等模型的模块中存在大量GEMM运算,会直接影响模型的推理延时.针对该算子的优化问题,分别采用循环展开、OpenMP、NEON指令集等方法进行优化,在国产嵌入式板卡飞腾D2000、国产操作系统进行实验测试.实验结果表明优化后比优化前加速43.89 倍,优化方法加速效果行之有效,可以大大降低人工智能模型在边缘端的推理延时.

外文标题：GEMM Algorithm Design and Optimization Implementation Technology Based on Feiteng D2000

外文摘要：In the deep learning inference framework,GEMM is a typical calculation-intensive operator.For example,there are a large number of GEMM operations in the modules of Bert,Transformer,Yolo and other models.Therefore,the quality of the underlying implementation of the GEMM operator in the deep learning framework will directly It affects the inference delay of the model.Due to the limited computing power of the edge embedded platform,optimizing this operator is crucial.The main work of this article is to perform embedded optimization on it,using loop expansion,OpenMP,NEON instruction set and other methods for optimization.Experimental tests were conducted on the domestic embedded board Feiteng D2000 and the domestic operating system.The experimental results show that the operator is optimized af-ter The acceleration is 43.89 times faster than before optimization.The acceleration effect of this optimi-zation method is effective and can greatly reduce the inference delay of the artificial intelligence model at the edge.

外文关键词：

inference frameGEMMOpenMPNEONFeiteng D2000

作者：

郑恩、白林亭、文鹏程

展开 >

作者单位：

航空工业西安航空计算技术研究所,陕西西安 710000

机载弹载计算机航空科技重点实验室,陕西西安 710000

关键词：

推理框架 GEMM OpenMP NEON 飞腾D2000

基金：

航空科学基金

项目编号：

2022Z071031001

出版年：

2024

航空计算技术

中国航空工业西安航空计算技术研究所

航空计算技术

CSTPCD

影响因子：0.316

ISSN：1671-654X

年,卷(期)：2024.54(3)

参考文献量11