东北师大学报(自然科学版)2024,Vol.56Issue(2) :83-90.DOI:10.16163/j.cnki.dslkxb202302280002

基于MacBERT和标签平滑的新冠疫情公众情感分析研究

Research on public sentiment analysis of COVID-19 based on MacBERT and label smoothing

王坤朋 禹龙 王博 周铁军 田生伟
东北师大学报(自然科学版)2024,Vol.56Issue(2) :83-90.DOI:10.16163/j.cnki.dslkxb202302280002

基于MacBERT和标签平滑的新冠疫情公众情感分析研究

Research on public sentiment analysis of COVID-19 based on MacBERT and label smoothing

王坤朋 1禹龙 2王博 1周铁军 3田生伟1
扫码查看

作者信息

  • 1. 新疆大学软件学院,新疆 乌鲁木齐 830091
  • 2. 新疆大学网络与信息技术中心,新疆 乌鲁木齐 830046
  • 3. 新疆互联网信息中心,新疆 乌鲁木齐 830001
  • 折叠

摘要

针对BERT预训练与下游任务微调阶段存在不匹配差异,以及人工对文本数据进行情感倾向性标注可能存在误差的问题,提出一种基于MacBERT和标签平滑的网络模型(MacLMC).首先,在BERT的基础上引入MLM as correction策略,利用近义词替换被掩码词,通过MacBERT预训练模型获取词向量;其次,经过双层LSTM学习长距离依赖;再次,采用双通道多卷积核的卷积操作,分别提取信息的最大特征和均值特征;最后,利用标签平滑策略降低模型预测类别的概率,提升模型对于标签的容错能力,提高模型泛化性.实验结果表明:与现有主流模型相比,本文模型在多种数据集上性能表现更佳,能够更好地用于新冠疫情公众情感分析任务.

Abstract

Aiming at the mismatch between BERT pretraining and downstream task fine-tuning stages,and the possible error in manual emotional orientation annotation of text data,a network model based on MacBERT and label smoothing (MacLMC)was proposed.First,MLM as correctionstrategy is introduced on the basis of BERT,the masked words are replaced by synonyms,and the word vectors are obtained through MacBERT pretraining model.Then,the long distance dependence is learned through double-layer LSTM.Next,the convolution operation of dual channel multi convolution kernel is used to extract the maximum feature and average feature of information respectively,Finally,the label smoothing strategy is used to reduce the probability of the model predicting the category. Improve the fault tolerant ability of the model for labels and improve the generalization of the model. The experimental results show that compared with the existing mainstream models,the model in this paper performs better on multiple data sets and can be better used for the public sentiment analysis task of COVID-19 epidemic.

关键词

新冠疫情/MacBERT/标签平滑/情感分析

Key words

COVID-19/MacBERT/label smoothing/sentiment analysis

引用本文复制引用

基金项目

新疆维吾尔自治区重大科技专项(2020A03004-4)

新疆维吾尔自治区重点研发项目(2021B01002)

国家自然科学基金重点项目(U2003208)

出版年

2024
东北师大学报(自然科学版)
东北师范大学

东北师大学报(自然科学版)

CSTPCD北大核心
影响因子:0.612
ISSN:1000-1832
参考文献量27
段落导航相关论文