首页|基于数据泛化的机器自动翻译系统设计

基于数据泛化的机器自动翻译系统设计

扫码查看
为了缓解现存神经机器翻译系统中的未登录词汇、过度翻译与漏翻等问题,研究提出一种基于改进数据泛化的中英自动翻译系统,过程中融入数据增强与解码策略得到质量较优的伪双语句对,有效避免系统保存多个模型;另外引入一种多覆盖机制融合的中英机器翻译模型,由此缓解过翻与漏翻的情况发生.结果显示,研究方法在迭代进行到第41次与第19次便达到稳定状态;当训练数据样本集合为6×105时,研究方法MarcoF1值较高,为97.8%;实际效果对比中,当源语言句子长度区间高于50时,研究方法的BLEU值高达98.23%.以上数据说明研究方法能够有效提升中英自动翻译系统的翻译准确率,并且能够翻译不同长度句子,为后续机器自动翻译系统的性能提升提供了新的参考方案.
Design of Automatic Machine Translation System Based on Data Generalization
In order to alleviate the problems of unregistered words,over translation,and missing translations in existing neural machine translation systems,a Chinese English automatic translation system based on improved data generalization was proposed.Da-ta enhancement and decoding strategies were incorporated in the process to obtain high-quality pseudobilingual sentence pairs,effec-tively avoiding the system from saving multiple models;In addition,a Chinese English machine translation model incorporating multi-ple coverage mechanisms is introduced to alleviate the occurrence of over translation and missed translation.The results show that the research method reached a stable state during the 41st and 19th iterations;When the training data sample set is 6 × At 105,the re-search method MarcoF1 had a high value of 97.8%;In the actual effect comparison,when the source language sentence length inter-val is higher than 50,the BLEU value of the research method is as high as 98.23%.The above data indicate that the research method can effectively improve the translation accuracy of Chinese English automatic translation systems,and can translate sentences of differ-ent lengths,providing a new reference scheme for the subsequent performance improvement of machine automatic translation systems.

automatic translation systemdata generalizationEnglishneural machine translation

张晓辉

展开 >

西安翻译学院,西安 710105

数据泛化 神经机器翻译 英语 自动翻译系统

陕西省教育规划课题(十四五)西安翻译学院校级科研团队项目

SGH22Y1752XFU21KYTDB01

2024

自动化与仪器仪表
重庆工业自动化仪表研究所,重庆市自动化与仪器仪表学会

自动化与仪器仪表

CSTPCD
影响因子:0.327
ISSN:1001-9227
年,卷(期):2024.(1)
  • 11