信息技术2024,Issue(2) :15-21.DOI:10.13274/j.cnki.hdzj.2024.02.003

基于句法结构特征的汉越神经机器翻译

Chinese-Vietnamese neural machine translation based on syntactic structure features

裴非非 杨舰
信息技术2024,Issue(2) :15-21.DOI:10.13274/j.cnki.hdzj.2024.02.003

基于句法结构特征的汉越神经机器翻译

Chinese-Vietnamese neural machine translation based on syntactic structure features

裴非非 1杨舰1
扫码查看

作者信息

  • 1. 昆明理工大学信息工程与自动化学院,昆明 650500;昆明理工大学,云南省人工智能重点实验室,昆明 650500
  • 折叠

摘要

在低资源神经机器翻译中,长句译文质量普遍不佳,而汉-越语言差异较大,是典型的资源匮乏型语种,对于长句的处理应尽可能保持句子语义信息不变.因此,提出一种基于句法结构特征处理长句的方法.首先,对原有语料库中长句进行句法树解析,然后,根据句法解析树提取短句和对远离根节点的叶子节点词进行标记,最后,对提取的短句进行反向翻译生成伪平行数据作为扩充,对原有长句中标记词进行与该词语义相近词的加权组合替换训练.实验表明,该方法提高了模型性能,显著改善了长句译文质量.

Abstract

In low-resource neural machine translation,the translation quality of long sentences is generally poor,and the Chinese-Vietnamese languages are quite different,which is a typical resource-poor language.The processing of long sentences should keep the semantic information of the sentences unchanged as much as possible.Therefore,a method for processing long sentences based on syntactic structure features is pro-posed.Firstly,syntactic tree parsing is performed on long sentences in the original corpus,then short sen-tences are extractd according to the syntactic parse tree and leaf node words far away from the root node are marked.Finally,reverse translation on the extracted short sentences are used to generate pseudo-parallel data as an extension,and the weighted combination replacement training of the semantically similar words in the original long sentence is taken on the marked words.Experiments show that this method improves model performance and significantly improves the quality of long-sentence translations.

关键词

低资源神经机器翻译/长句译文/汉-越语言/语义信息/句法结构特征

Key words

low-resource neural machine translation/long sentences translation/Chinese-Vietnamese lan-guage/semantic information/syntactic structure features

引用本文复制引用

基金项目

国家重点研发计划(2019QY1801)

国家重点研发计划(2019QY1802)

国家重点研发计划(2019QY1800)

国家自然科学基金(61972186)

国家自然科学基金(61732005)

国家自然科学基金(U21B2027)

云南高新技术产业发展项目(201606)

云南省重大科技专项计划(202103AA080015)

云南省重大科技专项计划(202002AD080001-5)

云南省基础研究计划(202001-AS070014)

云南省学术和技术带头人后备人才(202105AC160018)

出版年

2024
信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
参考文献量11
段落导航相关论文