基于参数高效微调的藏文大模型研究

Tibetan Large Model Based on Efficient Parameter Fine Tuning

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：大模型是指拥有庞大参数量的深度学习模型,具备强大的表征学习和生成能力,对自然语言处理等领域产生了深远影响.随着技术的不断进步.大模型在性能和应用范围上不断取得突破,成为人工智能领域的研究热点.然而,大模型的发展也面临着一些挑战,如模型训练成本高、参数冗余以及跨语言应用存在局限性等.特别地,在藏文这一具有独特语言特性的研究领域,大模型的研究尚处于起步阶段,缺乏相应的模型和资源支持.针对上述问题,该文通过基于LoRA的参数高效微调方法,提出了基于Llama2模型架构构建的Tibetan-Llama2和Tibetan-Alpaca模型,经过较大规模数据的增量预训练和指令微调,上述两种模型具备了对藏文的长文本理解和生成能力,展现了其多任务学习能力,并且在多个领域都有广泛的应用前景.

外文摘要：A large model refers to a deep learning model with many parameters,which has powerful representation learning and generation capabilities and has had a profound impact on fields such as natural language processing.With the continuous advancement of technology,large models have made breakthroughs in performance and application scope,becoming a research hotspot in the field of artificial intelligence.However,the development of large models also faces some challenges,such as high model training costs,parameter redundancy,and limitations in cross language applications.Specifically,in the field of Tibetan,which has unique language characteristics,research on large models is still in its early stages and lacks corresponding models and resource support.In response to the above issues,this article proposes an efficient parameter fine-tuning method based on LORA and constructs the Tibetan-Llama2 and Tibetan-Alpaca models based on the Llama2 model architecture.After incremental pre-training with large-scale data and instruction fine-tuning,they can understand and generate long Tibetan texts,demonstrate their multitasking learning ability,and have broad application prospects in multiple fields.

外文关键词：

natural language processingtibetan language modelefficient parameter fine-tuningincremental pre-traininginstruction fine-tuning

作者：

杨毛加、柔特、才智杰、官却才让、贡去卓么

展开 >

作者单位：

青海师范大学计算机学院,青海西宁 810016

省部共建藏语智能信息处理及应用国家重点实验室,青海西宁 810008

关键词：

自然语言处理藏文大模型参数高效微调增量预训练指令微调

出版年：

2024

中文信息学报

中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心

影响因子：0.8

ISSN：1003-0077

年,卷(期)：2024.38(12)