篇章级神经机器翻译语篇结构研究

Research on discourse structure in document-level neural machine translation

扫码查看

原文链接

维普
万方数据

中文摘要：[目的]当前篇章级机器翻译系统通过使用上下文信息来生成更好的翻译,但大部分工作都是从模型的角度出发,利用上下文字词信息来构建有效的网络结构,从而忽略了源端文本内部的结构,这导致了对上下文的利用不足.[方法]在修辞结构理论的指导下,设计丰富的基本语篇单元(EDU)表示,从而刻画其文本覆盖范围、携带信息量和简单的主从修辞关系,并提出相应的算法将修辞结构树中复杂修辞关系映射到EDU中.[结果]本文方法可以在不增加序列长度的基础上最大程度地保留EDU的修辞关系信息,在两个语言对的4个数据集上的实验结果表明,改进的模型在多个优质的基线系统上实现了大于1 BLEU分数的显著性能提升,并且在本文根据EDU分布特征提出的定量评估中也表现出较为明显的改进结果.[结论]本文提出的方法可以方便应用于多个篇章级神经机器翻译模型中,具有高效灵活的特点和广泛的适用性.

外文摘要：[Objective]The current document-level machine translation systems are expected to generate better translations by considering contextual information.However,most of them focus on building effective network structures from the perspective of models,utilizing contextual information,yet neglecting the structure inside the source text.Undesirably,this status leads to under-utilization of the context.[Methods]Therefore,guided by rhetorical structure theory,a rich representation is designed for elementary discourse units(EDUs),which involves text coverage,information score,not only simple master-slave relation but also complex rhetorical relation to its neighbor by using a carefully designed algorithm.[Results]The proposed method can preserve the rhetorical relation information of EDU to the maximum extent without increasing the sequence length.Experimental results on four datasets of two language pairs show that the improved model significantly outperforms 1 BLEU score on multiple high-quality baseline systems,and also show significant improvement results in the quantitative evaluation proposed herein based on the distribution characteristics of EDU.[Conclusion]The proposed method can be readily applied to multiple document-level neural machine translation models by means of efficient and flexible characteristics and wide applicability.

外文关键词：

neural machine translationdiscourse analysisdocument-level translationrhetorical structure theory

作者：

姜云卓、贡正仙、李军辉

展开 >

作者单位：

苏州大学计算机科学与技术学院,江苏苏州 215006

关键词：

神经机器翻译语篇分析篇章翻译修辞结构理论

出版年：

2024

DOI：

10.6043/j.issn.0438-0479.202401013

厦门大学学报(自然科学版)

厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.449

ISSN：0438-0479

年,卷(期)：2024.63(6)