基于卷积神经网络的藏语情感词典自动构建方法
Study on the Automatic Construction Method of Tibetan Sentiment Dictionary based on C-TF
公确多杰 1索南才让1
作者信息
- 1. 青海师范大学计算机学院 青海西宁 810016;藏文信息处理教育部重点实验室 青海西宁 810008;青海省藏文信息处理与机器翻译重点实验室 青海西宁 810008
- 折叠
摘要
针对藏语情感分析研究中的特殊挑战,包括缺乏标注数据和有限的语言资源等问题,文章提出一种基于卷积神经网络CNN和词频结合的C-TF藏语情感词典自动构建方法.藏文有许多丰富的情感文本,文章对藏族传统文献八大藏戏和社交媒体评论中的情感词汇进行词频统计,结合词频和卷积神经网络计算出情感种子词,采用了大规模无标注数据进行预训练,并使用少量标注数据进行了微调,最终构建了包含12 503条情感词汇的藏语情感词典.文章提出的情感词典构建方法为进一步研究藏文文本情感分类问题提供了新的思路和实验证据.
Abstract
To solve the unique challenges in the study of sentiment analysis of Tibetan languages,such as the lack of annotation data and limited language resources,an automatic construction method of Tibetan sentiment dictionary based on the combination of convolutional neural network(CNN)and Term frequency(C-TF)is pro-posed in this paper.There are many rich emotional texts in the Tibetan language.Statistical analysis was con-ducted for the sentiment words collected from eight Tibetan operas in traditional Tibetan literature combined with the word frequency in some social media comments,and emotional seed words were calculated with the com-bination of the word frequency and CNN,And then using a large-scale unlabeled data for pre-training and a small amount of labeled data for fine-tuning,a Tibetan sentiment dictionary with 12 503 emotional words is con-structed.To evaluate the accuracy of the dictionary proposed in this paper,we compared it with other dictionar-ies using the open source sentiment analysis dataset TU_SA,and the experimental results demonstrate that our method achieves significant performance improvement in the task of emotion dictionary construction.
关键词
情感词典构建/低资源语言/CNN/C-TF/藏语情感分析Key words
sentiment dictionary construction/low resource language/CNN/C-TF/Tibetan sentiment analysis引用本文复制引用
出版年
2024