首页|藏文情感词典构建的现状分析

藏文情感词典构建的现状分析

扫码查看
近年来,许多研究者证实,基于深度学习的多特征融合情感分析方法比纯深度学习方法更能挖掘文本的情感信息,其中情感词特征是最重要的特征之一。目前,藏文虽然有少量的情感词典,但基本上没有公开,想要使用藏文情感词典资源,只能自行构建。研究藏文情感词典的构建现状能对后续藏文情感词典的构建提供帮助。为了解藏文情感词典的词汇分类方法、常用词典构建方法以及已有藏文情感词典的词汇量与词汇构成等方面的研究现状,文中通过对比和统计等方法分析了近10 年藏文情感词典构建相关的文献(以CHKI为主),总结出了藏文情感词典构建方面的研究状况。经研究发现,情感词的分类方法中,主要有7 大类21 小类、12 大类20 小类、2 大类18 小类等。藏文情感词典的构建方法包括词典匹配、机器翻译、SO-PMI扩充、基于word2vec或BERT的相似度扩充方法等。已有藏文情感词典的词汇量大致在5 000至28 000 之间,接近中文情感词典的水平,词汇构成主要包含情感词、程度副词、否定词、双重否定词、表情词等。希望为相关研究人员提供参考。
Status Analysis of Construction of Tibetan Emotional Dictionary
In recent years,many researchers have confirmed that deep learning based multi feature fusion sentiment analysis methods are more capable of mining emotional information in texts than pure deep learning methods,with emotional word features being one of the most important features.At present,although there are a small number of emotional lexicon in Tibetan,they are basically not publicly a-vailable.If you want to use Tibetan emotional lexicon resources,you can only build them yourself.Studying the current construction status of Tibetan emotion lexicon can provide assistance for the subsequent construction of Tibetan emotion lexicon.In order to understand the vocabulary classification methods,commonly used lexicon construction methods,and the current research status of the vo-cabulary and composition of existing Tibetan emotional lexicon,we analyze the literature related to the construction of Tibetan emotional lexicon in the past 10 years(mainly CHKI)through comparative and statistical methods,and summarize the research status of the construction of Tibetan emotional lexicon.Through research,it has been found that the classification methods for emotional words mainly include 7 categories and 21 subcategories,12 categories and 20 subcategories,2 categories and 18 subcategories.The construction methods of Tibetan emotional lexicon include lexicon matching,machine translation,SO-PMI expansion,similarity expansion based on word2vec or BERT,etc.The vocabulary of existing Tibetan emotional lexicon is roughly distributed between 5 000 and 28 000,close to the level of Chinese emotional lexicon.The vocabulary composition mainly includes emotional words,degree adverbs,negative words,double negative words,emoticons,etc.We hope to provide reference for researchers and those who are building Tibetan emotional lexi-con.

Tibetan emotional lexiconemotional word classificationlexicon construction methodvocabularyvocabulary composition

才让东知、杨杰、尼玛扎西

展开 >

藏文信息技术教育部工程研究中心,西藏 拉萨 850000

西藏大学 信息科学技术学院,西藏 拉萨 850000

西藏信息化省部共建协同创新中心,西藏 拉萨 850000

藏文情感词典 情感词分类 词典构建方法 词汇量 词汇构成

国家科技创新2030——"新一代人工智能"重大项目西藏大学2021级研究生'高水平人才培养计划'项目

2022ZD01161012021-GSP-S129

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(3)
  • 27