基于对话的中文正面情感风格迁移

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：文本风格迁移旨在保留文本内容的前提下,通过编辑或生成的方法使得目标文本带有某些特殊属性,如礼貌、情感、性别等.现有的情感风格迁移研究主要集中在英文数据集上,在中文数据集上的研究相对较少.该文构建了一个基于对话的中文情感文本数据集,该数据集的部分原始数据源自电视连续剧《家有儿女》中的对白,并对其进行人工标注和循环模型标注.目前该数据集的负面情感文本和正面情感文本句子共30836个.根据该数据集中大多数情感词是显性的特点,在编辑类的模型上开展了基于对话的中文正面情感风格迁移的研究.实验结果表明:在该数据集上,编辑类的模型可以较好地识别文本的情感属性,并实现文本正面情感风格迁移.

外文标题：Chinese positive sentiment style transfer based on dialogues

外文摘要：[Objective]Several studies highlight that negative sentiment dialogues within the family remarkably impact individuals'mental and physical well-being.Conversely,positive sentiment dialogues offer individuals constructive feedback,motivating learning and personal growth.Such dialogues aid in building self-confidence and positive attitude,enabling better coping with life's challenges.Text style transfer is an effective tool to shift negative sentimental dialogues to positive sentimental dialogues.The goal of text style transfer is to retain the content of the text while imbuing the generated text with specific attributes.Sentiment style transfer is an important research direction in natural language processing,and sentiment style transfer in the context of family dialogues holds practical value.However,the current literature on sentiment style transfer has mainly focused on English datasets with relatively limited research within the Chinese domain.[Methods]We constructed a dialogue-based Chinese sentimental text dataset in this study.The initial data was extracted from dialogues in the TV series"Home with Kids",where considerable sentiment differences were observed between dialogues involving characters Liu Mei and Liu Xing as well as Liu Mei and Xia Xue.While interactions between Liu Mei and Liu Xing were primarily critical,interactions between Liu Mei and Xia Xue were characterized by encouragement and respect.Preprocessing was applied to this dataset in the following steps:(1)Data cleaning,filtering,and format conversion were performed to ensure data quality and consistency.(2)A recurrent modeling annotation method was employed using suitable algorithms and models to annotate the data,identifying key information and features.Six iterations were performed,with the classifier being fine-tuned using the data updated from the previous iteration each time.(3)Manual annotation was also conducted,meticulously reviewing and labeling the data manually to further enhance accuracy and reliability.Furthermore,the final dataset comprises 30 836 sentences,including 11 562 sentences with positive sentiment content and 19 274 sentences with negative sentiment content.[Results]In this dialogue dataset,most texts explicitly contain sentiment-related words.Based on the characteristics of this dialogue dataset,research involving dialogue-based Chinese positive sentiment style transfer was started using editing-based delete-retrieve-generate(DRG),tagger and generator(TAG),conditional Bert(CondBert),and tagging without rewriting(TWR)models.In addition,the improved TWR(TWR*)Transformer model was introduced.The original TWR model used a multilayer perceptron to train a style classifier.To improve the ability to accurately identify specific styles,a style classifier was trained based on RoBERTa-Large-Chinese model for distinguishing different text styles.These experiments demonstrated that using the pretrained language model RoBERTa-Large-Chinese produced enhanced classification results,which was attributed to the close relationship between the attention weights of the penultimate layer in the Transformer model and words commonly associated with positive and negative sentiments.RoBERTa-Large-Chinese model presented a higher accuracy in recognizing textual sentiment style attribute words.[Conclusions]Experimental results confirm that the style classifier trained on our dataset can effectively identify negative content within text.Through both automated and manual evaluations,this TWR* model outperforms baseline models in identifying textual sentiment attributes,achieving positive sentiment style transfer,thus verifying the effectiveness of model enhancements and the validity of the dataset.

外文关键词：

natural language processingtext generationsentiment style transferrecurrent modelediting-based modelfamily dialogue

作者：

胡玉婷、左家莉、刘江盛、万剑怡、王明文

展开 >

作者单位：

江西师范大学计算机信息工程学院,南昌 330022

关键词：

自然语言处理文本生成情感风格迁移循环模型编辑模型家庭对话

基金：

国家自然科学基金项目国家自然科学基金项目

项目编号：

6186601862266021

出版年：

2024

DOI：

10.16511/j.cnki.qhdxxb.2023.22.052

清华大学学报(自然科学版)

清华大学

清华大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.586

ISSN：1000-0054

年,卷(期)：2024.64(5)

参考文献量27