藏文句向量预训练模型在嵌入式系统中的应用研究

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：本文研究了将藏文句向量预训练模型部署到嵌入式系统上进行推理和测试的问题.在机器学习中,对文本进行编码和表征存在困难,因此句向量技术成为自然语言处理领域的重要研究方向.然而,在藏文自然语言处理领域,句向量研究相对较少.为此,本文分析了藏文领域已有的预训练模型和句向量表示方法,并设计了一种改进的无监督SimCSE方法(Improved Sim-ple Contrastive Learning of Unsupervised Sentence Embeddings,I-SimCSE).实验结果显示,使用 I-SimCSE 方法得到的藏文句向量模型性能优于其他方法.同时,本文探讨了边缘计算与预训练模型相结合的应用,并讨论了预训练语言模型在嵌入式系统上的潜在应用场景.最后,本文将I-SimCSE句向量模型部署在嵌入式设备Jetson TX1上,并测试了其平均单次推理时间,结果表明在嵌入式系统上部署预训练语言模型进行推理是可行的.综上所述,本文的研究对于藏文句向量预训练模型在嵌入式系统上的应用研究提供了有益的参考,并为未来藏文大模型在嵌入式系统的发展提供了指导和启示.

外文标题：Research on the Application of Tibetan Sentence Vector Pre-trained Model in Embedded Systems

外文摘要：This paper investigates the problem of deploying Tibetan sentence vector pre-trained models to embedded systems for infer-ence and testing.In machine learning,encoding and representation of text pose challenges,leading to the prominence of sentence vector techniques in the field of natural language processing(NLP).However,research on sentence vectors in the Tibetan NLP domain is rel-atively limited.Therefore,this paper analyzes the existing pre-trained models and sentence vector representation methods in the Tibetan language field,and designs an improved simple contrastive learning of unsupervised sentence embeddings(I-SimCSE)method.The experimental results indicate that the performance of Tibetan sentence vectors model trained using the I-SimCSE method outperforms other methods.Furthermore,this paper explores the integration of edge computing with pre-trained models and discusses the potential application scenarios of pre-trained language models on embedded systems.Finally,the I-SimCSE sentence vector model is deployed on the Jetson TX1 embedded device,and its average single inference time is tested,demonstrating the feasibility of deploying pre-trained language models for inference on embedded systems.In conclusion,this research provides a useful reference for the application of Tibetan sentence vector pre-trained models on embedded systems,and offer guidance and insights for the future development of Ti-betan large model on embedded systems.

外文关键词：

Tibetansentence vector representationembedded systempre-trained model

作者：

吕皓、吕慧、雍宾宾、多拉、李妍、周庆国、周睿

展开 >

作者单位：

兰州大学信息科学与工程学院,兰州 730000

青海师范大学省部共建藏语智能信息处理及应用国家重点实验室,西宁 810008

关键词：

藏文句向量表示嵌入式系统预训练模型

出版年：

2025

DOI：

10.20009/j.cnki.21-1106/TP.2023-0410

小型微型计算机系统

中国科学院沈阳计算技术研究所

小型微型计算机系统

北大核心

影响因子：0.564

ISSN：1000-1220

年,卷(期)：2025.46(1)