Research on discourse structure in document-level neural machine translation
[Objective]The current document-level machine translation systems are expected to generate better translations by considering contextual information.However,most of them focus on building effective network structures from the perspective of models,utilizing contextual information,yet neglecting the structure inside the source text.Undesirably,this status leads to under-utilization of the context.[Methods]Therefore,guided by rhetorical structure theory,a rich representation is designed for elementary discourse units(EDUs),which involves text coverage,information score,not only simple master-slave relation but also complex rhetorical relation to its neighbor by using a carefully designed algorithm.[Results]The proposed method can preserve the rhetorical relation information of EDU to the maximum extent without increasing the sequence length.Experimental results on four datasets of two language pairs show that the improved model significantly outperforms 1 BLEU score on multiple high-quality baseline systems,and also show significant improvement results in the quantitative evaluation proposed herein based on the distribution characteristics of EDU.[Conclusion]The proposed method can be readily applied to multiple document-level neural machine translation models by means of efficient and flexible characteristics and wide applicability.
neural machine translationdiscourse analysisdocument-level translationrhetorical structure theory