首页|基于多策略强化学习的低资源跨语言摘要方法研究

基于多策略强化学习的低资源跨语言摘要方法研究

扫码查看
跨语言摘要(CLS)旨在给定1个源语言文件(如越南语),生成目标语言(如中文)的摘要。端到端的CLS模型在大规模、高质量的标记数据基础上取得较优的性能,这些标记数据通常是利用机器翻译模型将单语摘要语料库翻译成CLS语料库而构建的。然而,由于低资源语言翻译模型的性能受限,因此翻译噪声会被引入到CLS语料库中,导致CLS模型性能降低。提出基于多策略的低资源跨语言摘要方法。利用多策略强化学习解决低资源噪声训练数据场景下的CLS模型训练问题,引入源语言摘要作为额外的监督信号来缓解翻译后的噪声目标摘要影响。通过计算源语言摘要和生成目标语言摘要之间的单词相关性和单词缺失程度来学习强化奖励,在交叉熵损失和强化奖励的约束下优化CLS模型。为验证所提模型的性能,构建1个有噪声的汉语-越南语CLS语料库。在汉语-越南语和越南语-汉语跨语言摘要数据集上的实验结果表明,所提模型ROUGE分数明显优于其他基线模型,相比NCLS基线模型,该模型ROUGE-1分别提升0。71和0。84,能够有效弱化噪声干扰,从而提高生成摘要的质量。
Research on Low-Resource Cross-Lingual Summarization Method Based on Multi-Strategy Reinforcement Learning
Cross-Lingual Summarization(CLS)aims to generate a summary in the target language(such as Chinese)given a source language file(such as Vietnamese).The end-to-end CLS model achieves better performance on large-scale and high-quality labeled data,which are usually constructed using models to machine translate monolingual abstract corpora into CLS corpora.However,the limited performance of low-resource language translation models,introduces noise into the CLS corpus,leading to a decrease in the performance of the CLS model.This paper proposes a low-resource CLS method based on multi-strategy.Using multi-strategy reinforcement learning to solve the training problem of CLS models in low-resource noise training data scenarios,whereby source language summaries are introduced as additional supervisory signals to alleviate the impact of translated noisy target summaries.To learn reinforcement rewards,the correlation and degree of missing words between the source and generated target language abstracts are calculated,thereby optimizing the CLS model under the constraints of cross entropy loss and reinforcement rewards.To verify the performance of the proposed model,a noisy Chinese-Vietnamese CLS corpus is constructed.The experimental results on the Chinese-Vietnamese and Vietnamese-Chinese CLS datasets show that the proposed model has significantly better ROUGE scores than the NCLS baseline model,improving ROUGE-1 by 0.71 and 0.84,respectively,effectively weakening noise interference and enhancing the quality of generated summaries.

Chinese-Vietnamese Cross-Lingual Summarization(CLS)low-resourcenoise datanoise analysismulti-strategy reinforcement learning

冯雄波、黄于欣、赖华、高玉梦

展开 >

昆明理工大学信息工程与自动化学院,云南 昆明 650504

昆明理工大学云南省人工智能重点实验室,云南 昆明 650504

汉语-越南语跨语言摘要 低资源 噪声数据 噪声分析 多策略强化学习

国家自然科学基金云南省重大科技专项项目云南省基础研究计划面上项目云南省基础研究计划面上项目昆明理工大学"双一流"创建联合专项

U21B2027202202AD080003202201AT070915202201AT070768202201BE070001-021

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(2)
  • 1