首页|基于句法结构特征的汉越神经机器翻译

基于句法结构特征的汉越神经机器翻译

扫码查看
在低资源神经机器翻译中,长句译文质量普遍不佳,而汉-越语言差异较大,是典型的资源匮乏型语种,对于长句的处理应尽可能保持句子语义信息不变。因此,提出一种基于句法结构特征处理长句的方法。首先,对原有语料库中长句进行句法树解析,然后,根据句法解析树提取短句和对远离根节点的叶子节点词进行标记,最后,对提取的短句进行反向翻译生成伪平行数据作为扩充,对原有长句中标记词进行与该词语义相近词的加权组合替换训练。实验表明,该方法提高了模型性能,显著改善了长句译文质量。
Chinese-Vietnamese neural machine translation based on syntactic structure features
In low-resource neural machine translation,the translation quality of long sentences is generally poor,and the Chinese-Vietnamese languages are quite different,which is a typical resource-poor language.The processing of long sentences should keep the semantic information of the sentences unchanged as much as possible.Therefore,a method for processing long sentences based on syntactic structure features is pro-posed.Firstly,syntactic tree parsing is performed on long sentences in the original corpus,then short sen-tences are extractd according to the syntactic parse tree and leaf node words far away from the root node are marked.Finally,reverse translation on the extracted short sentences are used to generate pseudo-parallel data as an extension,and the weighted combination replacement training of the semantically similar words in the original long sentence is taken on the marked words.Experiments show that this method improves model performance and significantly improves the quality of long-sentence translations.

low-resource neural machine translationlong sentences translationChinese-Vietnamese lan-guagesemantic informationsyntactic structure features

裴非非、杨舰

展开 >

昆明理工大学信息工程与自动化学院,昆明 650500

昆明理工大学,云南省人工智能重点实验室,昆明 650500

低资源神经机器翻译 长句译文 汉-越语言 语义信息 句法结构特征

国家重点研发计划国家重点研发计划国家重点研发计划国家自然科学基金国家自然科学基金国家自然科学基金云南高新技术产业发展项目云南省重大科技专项计划云南省重大科技专项计划云南省基础研究计划云南省学术和技术带头人后备人才

2019QY18012019QY18022019QY18006197218661732005U21B2027201606202103AA080015202002AD080001-5202001-AS070014202105AC160018

2024

信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
年,卷(期):2024.(2)
  • 11