首页|藏文情感语料库的构建及自动标注方法研究

藏文情感语料库的构建及自动标注方法研究

扫码查看
针对藏文情感分析领域中缺乏相应的基础训练语料库、模型又需要大量的数据做支撑、传统的人工标注需要耗费大量的人力物力资源且普适性不高的情况,构建了细粒度的藏文情感语料库和情感词典.首先由三人分别对每一个词进行情感强度标注,其次将语料和词典按规则进行匹配,最后以情感强度平均得分来表示文本的情感类别.本文所构建的细粒度情感语料资源,在一定程度上能够缩短海量标注语料库的开发周期,并降低语料标注的人工成本.
Research on the construction and automatic annotation method of Tibetan sentiment corpus
In the field of Tibetan sentiment analysis,there are problems such as a lack of corresponding basic training corpus,the need for a large amount of data to support models,the consumption of a lot of human and material resources and low universality for traditional manual annotation.To this end,a fine-grained Tibetan sentiment corpus and sentiment dictionary are constructed.Firstly,each word is annotated with sentiment intensity by three individuals.Then,the corpus and dictionary are matched according to the rules.Finally,the average score of sentiment intensity is used to represent the sentiment category of the text.The fine-grained sentiment corpus resources constructed in this paper can,to some extent,shorten the development cycle of massive annotated corpus and reduce the labor cost of corpus annotation.

Tibetan sentiment corpusfine-grained sentimentsentiment intensityautomatic annotation

尖羊措、安见才让

展开 >

青海民族大学计算机学院,青海 西宁 810007

省部共建藏语智能信息处理及应用国家重点实验室

青海省藏文信息处理与机器翻译重点实验室

藏文情感语料库 细粒度情感 情感强度 自动标注

省部共建藏语智能信息处理及应用国家重点实验室/青海省藏文信息处理与机器翻译重点实验室开放课题青海民族大学计算机学院研究生创新项目

2021-Z-00109M2022004

2023

计算机时代
浙江省计算技术研究所 浙江省计算机学会

计算机时代

影响因子:0.411
ISSN:1006-8228
年,卷(期):2023.(12)
  • 5