Uncovering machine translationese:On syntactic properties of neural machine-translated texts
Despite advances in neural machine translation(NMT),the persistence of"machine translationese,"characterised by unsatisfactory intelligibility and idiomaticity,remains a challenge.Existing studies have not clarified what"machine translationese"is,and little is known about its deep syntactic properties.Based on self-built dependency treebanks consisting of machine-and human-translated texts in the English-to-Chinese direction,we compared the syntactic properties of these texts in terms of dependency distance and dependency direction.The findings indicate that NMT is significantly deficient in controlling the syntactic complexity of long sentences,as evidenced by the improper use of passive structures and prepositional phrases,both of which may contribute to unintelligibility.Additionally,a preference for nominal structures,which characterise English,is evident in machine translation through the use of adverbial,right adjunct,and prepositional-object relations.This leads to differences in the word order distribution between the human-and machine-translated texts.These examples of machine translationese,captured at the syntactic level,shed light on machine translation quality assessment and post-editing.