句子情感分类致力于挖掘文本中的情感语义,以基于 BERT(bidirectional encoder representations from transformers)的深度网络模型表现最佳.这类模型的性能极度依赖大量高质量标注数据,而现实中标注样本往往比较稀缺,导致深度神经网络(deep neural network,DNN)容易在小规模样本集上过拟合,难以准确捕捉句子的隐含情感特征.尽管现有的半监督模型有效利用了未标注样本特征,但对引入未标注样本可能导致错误逐渐累积问题没有有效处理.半监督模型在对测试数据集进行预测后不会重新评估和修正上次的标注结果,无法充分挖掘测试数据的特征信息.研究提出一种新型的半监督句子情感分类模型.该模型首先提出基于K-近邻算法的权重机制,为置信度高的样本分配较高权重,尽可能减少错误信息在模型训练中的传播.接着,采用两阶段训练策略,使模型能对测试数据中预测错误的样本进行及时修正,通过多个数据集的测试,证明本模型在小规模样本集上也能获得良好性能.
A semi-supervised model for sentence-level sentiment classification
Sentence sentiment classification is an important task for extracting emotional semantics from text.Currently,the best tools for sentence sentiment classification leverage deep neural networks,particularly BERT-based models.However,these models require large,high-quality labeled datasets to perform effectively.In practice,labeled data is usually limited,leading to overfitting on small datasets and difficulties in capturing subtle sentiment features.Although existing semi-supervised models utilize features from large unlabeled datasets,they still face challenges from errors introduced by pseudo-labeled samples.Additionally,once test data is labeled,these models often do not adapt by incorporating feature information from test data.To address these issues,this paper proposes a semi-supervised sentence sentiment classification model.First,a K-nearest neighbors-based weighting mechanism is designed,assigning higher weights to high confidence samples to minimize error propagation during parameter learning.Second,a two-stage training mechanism is implemented,enabling the model to correct misclassified samples in the test data.Extensive experiments on multiple datasets show that this method achieves strong performance on small datasets.