一种基于半监督的句子情感分类模型

A semi-supervised model for sentence-level sentiment classification

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：句子情感分类致力于挖掘文本中的情感语义,以基于 BERT(bidirectional encoder representations from transformers)的深度网络模型表现最佳.这类模型的性能极度依赖大量高质量标注数据,而现实中标注样本往往比较稀缺,导致深度神经网络(deep neural network,DNN)容易在小规模样本集上过拟合,难以准确捕捉句子的隐含情感特征.尽管现有的半监督模型有效利用了未标注样本特征,但对引入未标注样本可能导致错误逐渐累积问题没有有效处理.半监督模型在对测试数据集进行预测后不会重新评估和修正上次的标注结果,无法充分挖掘测试数据的特征信息.研究提出一种新型的半监督句子情感分类模型.该模型首先提出基于K-近邻算法的权重机制,为置信度高的样本分配较高权重,尽可能减少错误信息在模型训练中的传播.接着,采用两阶段训练策略,使模型能对测试数据中预测错误的样本进行及时修正,通过多个数据集的测试,证明本模型在小规模样本集上也能获得良好性能.

外文摘要：Sentence sentiment classification is an important task for extracting emotional semantics from text.Currently,the best tools for sentence sentiment classification leverage deep neural networks,particularly BERT-based models.However,these models require large,high-quality labeled datasets to perform effectively.In practice,labeled data is usually limited,leading to overfitting on small datasets and difficulties in capturing subtle sentiment features.Although existing semi-supervised models utilize features from large unlabeled datasets,they still face challenges from errors introduced by pseudo-labeled samples.Additionally,once test data is labeled,these models often do not adapt by incorporating feature information from test data.To address these issues,this paper proposes a semi-supervised sentence sentiment classification model.First,a K-nearest neighbors-based weighting mechanism is designed,assigning higher weights to high confidence samples to minimize error propagation during parameter learning.Second,a two-stage training mechanism is implemented,enabling the model to correct misclassified samples in the test data.Extensive experiments on multiple datasets show that this method achieves strong performance on small datasets.

外文关键词：

sentence-level sentiment classificationsemi-supervised learningK-nearest neighborstransformer

作者：

苏静、MURTADHA Ahmed

展开 >

作者单位：

西北工业大学计算机学院,西安 710072

关键词：

句子情感分类半监督学习 K-近邻 transformer

出版年：

2024

DOI：

10.11835/j.issn.1000-582X.2021.126

重庆大学学报

重庆大学

重庆大学学报

CSTPCD北大核心

影响因子：0.601

ISSN：1000-582X

年,卷(期)：2024.47(12)