基于数据泛化的机器自动翻译系统设计

Design of Automatic Machine Translation System Based on Data Generalization

张晓辉¹

扫码查看

作者信息

1. 西安翻译学院,西安 710105
折叠

摘要

为了缓解现存神经机器翻译系统中的未登录词汇、过度翻译与漏翻等问题,研究提出一种基于改进数据泛化的中英自动翻译系统,过程中融入数据增强与解码策略得到质量较优的伪双语句对,有效避免系统保存多个模型;另外引入一种多覆盖机制融合的中英机器翻译模型,由此缓解过翻与漏翻的情况发生.结果显示,研究方法在迭代进行到第41次与第19次便达到稳定状态;当训练数据样本集合为6×105时,研究方法MarcoF1值较高,为97.8％;实际效果对比中,当源语言句子长度区间高于50时,研究方法的BLEU值高达98.23％.以上数据说明研究方法能够有效提升中英自动翻译系统的翻译准确率,并且能够翻译不同长度句子,为后续机器自动翻译系统的性能提升提供了新的参考方案.

Abstract

In order to alleviate the problems of unregistered words,over translation,and missing translations in existing neural machine translation systems,a Chinese English automatic translation system based on improved data generalization was proposed.Da-ta enhancement and decoding strategies were incorporated in the process to obtain high-quality pseudobilingual sentence pairs,effec-tively avoiding the system from saving multiple models;In addition,a Chinese English machine translation model incorporating multi-ple coverage mechanisms is introduced to alleviate the occurrence of over translation and missed translation.The results show that the research method reached a stable state during the 41st and 19th iterations;When the training data sample set is 6 × At 105,the re-search method MarcoF1 had a high value of 97.8％;In the actual effect comparison,when the source language sentence length inter-val is higher than 50,the BLEU value of the research method is as high as 98.23％.The above data indicate that the research method can effectively improve the translation accuracy of Chinese English automatic translation systems,and can translate sentences of differ-ent lengths,providing a new reference scheme for the subsequent performance improvement of machine automatic translation systems.

关键词

数据泛化/神经机器翻译/英语/自动翻译系统

Key words

automatic translation system/data generalization/English/neural machine translation

引用本文复制引用

基金项目

陕西省教育规划课题(十四五)(SGH22Y1752)

西安翻译学院校级科研团队项目(XFU21KYTDB01)

出版年

2024

自动化与仪器仪表

重庆工业自动化仪表研究所,重庆市自动化与仪器仪表学会

自动化与仪器仪表

CSTPCD

影响因子：0.327

ISSN：1001-9227

参考文献量11

段落导航