首页|面向国产深度学习平台的自然语言处理模型迁移研究

面向国产深度学习平台的自然语言处理模型迁移研究

扫码查看
深度学习平台在新一代人工智能的发展中扮演着重要的角色.近年来,以昇腾平台为代表的国产人工智能软硬件系统快速发展,为国产深度学习平台的发展开辟出了新的道路.与此同时,为了发现并解决昇腾系统存在的潜在漏洞,昇腾平台积极开展常用深度学习模型的迁移工作.从自然语言处理算法角度切入,针对机器阅读理解、神经机器翻译、序列标注和文本分类四大自然语言处理任务,以昇腾平台的高性能硬件芯片为基础,探究迁移ALBERT,RNNSearch,BERT-CRF和TextING这4类典型的自然语言处理模型.基于以上迁移研究,发现和整理了昇腾平台架构设计在自然语言处理研究与业务上的主要不足,即计算图节点动态空间的分配特性、资源算子下沉设备侧、图算融合以及混合精度训练4个方面的问题,并为以上问题提出了相应的解决方案,并进行了实验验证.最后,为国产深度学习平台的发展提出未来优化的方向和相关建议.
Study on Model Migration of Natural Language Processing for Domestic Deep Learning Platform
Deep learning platformplays an essential role in the development of the new generation of artificial intelligence.In re-cent years,the domestic artificial intelligence high-performance software and hardware system of China represented by the Ascend platform has developed rapidly,which opens up a new way for the deep learning platform in China.At the same time,in order to explore and solve the potential loopholes in the Ascend system,the platform developers of Ascend actively carries out the migra-tion of commonly used deep learning models with researchers.These efforts are further promoted from the perspective of natural language processingaiming at how to refine the domestic deep learning platform.Four natural language processing tasks arehigh-lighted,neural machine translation,machine reading comprehension,sequence labeling and text classification,along with four clas-sical neural models,Albert,RNNSearch,BERT-CRF and TextING.They are migrated on the Ascend platform in details.Based on the above model migration research,this paper integrates the deficiencies of the architecture design of the Ascend platform in the research and business in natural language processing.In conclusion,these deficiencies are sorted out as four essential aspects:1)the lack of the dynamic space allocation characteristics of computing graph nodes;2)incompatibility for the sinking of resource operators on the acceleration-deviceside;3)the fusion of graphics and computing which is not flexible to handle unseen model structures,and 4)the defects of the mixed-precision training strategy.To overcome these problems,this paper puts forward the avoidance methods or solutions.Finally,constructive suggestions are provided for,including but not limited to,the deep-learning platforms in China.

Natural language processingAscendDeep learningModel migrationPlatform architecture

葛慧斌、王德鑫、郑涛、张婷、熊德意

展开 >

天津大学智能与计算学部 天津 300350

华为技术有限公司南京研究所 南京 210000

中译语通科技股份有限公司 北京 100131

自然语言处理 昇腾 深度学习 模型迁移 平台构架

华为技术有限公司与天津大学NRE合作项目国家重点研发计划云南省重点研发计划

20101300441922C2020AAA0108000202203AA080004

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(1)
  • 2