首页|基于多奖励强化学习的半监督文本风格迁移方法

基于多奖励强化学习的半监督文本风格迁移方法

扫码查看
文本风格迁移是自然语言处理中的一项重要任务,其主要目的在于改变文本的风格属性,同时保留必要的语义信息.然而,在许多任务缺乏大规模平行语料库的情况下,现有的无监督方法存在文本多样性不足和语义一致性较差的问题.针对这些问题,文中提出了一种半监督的多阶段训练框架.该框架首先利用风格标注模型和掩码语言模型构造伪平行语料库,以有监督的方式引导模型学习多样性的迁移方式.其次,设计了对抗性相似奖励、Mis奖励和风格奖励,从未标记的数据中进行强化学习以增强模型的语义一致性、逻辑一致性和风格准确性.在基于YELP数据集的情感极性转换任务中,该方法的BLEURT分数提升了 3.1%,Mis分数提升了 2.5%,BLEU分数提升了 9.5%;在基于GYAFC数据集的正式文体转换实验中,该方法的BLEURT分数提高了 6.2%,BLEU分数提高了 3%.
Semi-supervised Text Style Transfer Method Based on Multi-reward Reinforcement Learning
Text style transfer is an important task in natural language processing that aims to change the stylistic attributes of text while preserving necessary semantic information.However,in many tasks where large-scale parallel corpora are lacking,existing unsupervised methods suffer from issues such as insufficient text diversity and poor semantic consistency.To address these problems,this paper proposes a semi-supervised multi-stage training framework.It first constructs a pseudo-parallel corpus using a style labeling model and a masked language model to guide the model to learn diverse transfer styles in a supervised man-ner.Then,adversarial similarity reward,Mis reward,and style reward are designed to conduct reinforcement learning on unlabeled data to enhance the model's semantic consistency,logical consistency,and accuracy of style transfer.In the sentiment polarity conversion task based on the YELP dataset,the proposed method's BLEURT score increases by 3.1%,the Mis score increases by 2.5%,and the BLEU score increases by 9.5%.In the formal style conversion experiment based on the GYAFC dataset,its BLEURT score increases by 6.2%,and the BLEU score increases by 3%.

Text generationText style transferMulti-stage trainingStyle labeling modelReinforcement learning

李静文、叶琪、阮彤、林宇翩、薛万东

展开 >

华东理工大学计算机科学与工程学院 上海 200237

文本生成 文本风格迁移 多阶段训练 风格标注模型 强化学习

上海市促进产业高质量发展专项资金国家重点研发计划国家重点研发计划

2021-GZL-RGZN-010182021YFC27018002021YFC2701801

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(8)