首页|文档级神经机器翻译综述

文档级神经机器翻译综述

扫码查看
机器翻译(machine translation,MT)研究旨在构建一个自动转换系统,将给定源语言序列自动地转换为具有相同语义的目标语言序列。由于机器翻译广阔的应用场景,使其成为自然语言理解领域乃至人工智能领域的一个重要的研究方向。近年来,端到端的神经机器翻译(neural machine translation,NMT)方法显著超越了统计机器翻译(statistical machine translation,SMT)方法,成为目前机器翻译研究的主流方法。然而,神经机器翻译系统通常以句子为翻译单位,在面向文档的翻译场景中,将文档中每个句子独立地进行翻译,会因脱离文档的篇章语境引起一些篇章级的错误,如词语错翻、句子间不连贯等。因此将文档级的信息融入到翻译的过程中去解决跨句的篇章级错误是更加自然和合理的做法,文档级的神经机器翻译(document-level neural machine translation,DNMT)的目标正是如此,成为机器翻译研究的热门方向。调研了近年来在文档级神经机器翻译研究方向的主要工作,从篇章评测方法、使用的数据集和模型方法等方面系统地对当前研究工作进行了归纳与阐述,目的是帮助研究者们快速了解文档级神经机器翻译研究现状以及未来的发展和研究方向。同时在文中也阐述了在文档级神经机器翻译的一些展望、困难和挑战,希望能带给研究者们一些启发。
Survey on Document-level Neural Machine Translation
Machine translation(MT)aims to build an automatic translating system to transform a given sequence in the source language into another target language sequence that shares identical semantic information.MT has been an important research direction in natural language processing and artificial intelligence fields for its widely applied scenarios.In recent years,the performance of neural machine translation(NMT)greatly surpasses that of statistical machine translation(SMT),becoming the mainstream method in MT research.However,NMT generally takes the sentence as the translated unit,and in document-level translation scenarios,some discourse errors such as the mistranslation of words and incoherent sentences may occur due to the separation with discourse context if the sentence is translated independently.Therefore,incorporating document-level information into the procedure of translation may be a more reasonable and natural way to solve discourse errors.This conforms with the goal of document-level neural machine translation(DNMT)and has been a popular direction in MT research.This study reviews and summarizes works in DNMT research in terms of discourse evaluation methods,datasets and models applied,and other aspects to help the researchers efficiently learn the research status and further directions of DNMT.Meanwhile,this study also introduces the prospect and some challenges in DNMT,hoping to bring some inspiration to researchers.

neural machine translationTransformerdocument-level contextdiscourse evaluation

吕星林、李军辉、陶仕敏、杨浩、张民

展开 >

苏州大学计算机科学与技术学院,江苏 苏州 215006

华为翻译中心,北京 100080

神经机器翻译 Transformer模型 文档上下文 篇章评测

2025

软件学报
中国科学院软件研究所,中国计算机学会

软件学报

北大核心
影响因子:2.833
ISSN:1000-9825
年,卷(期):2025.36(1)