图书馆论坛2024,Vol.44Issue(10) :93-102.

我国古代典籍时代特征视角下的机器翻译研究

A Study on Machine Translation of Ancient Chinese Books from the Perspective of Temporal Characteristics

吴梦成 林立涛 胡蝶 刘畅 黄水清 孟凯 王东波
图书馆论坛2024,Vol.44Issue(10) :93-102.

我国古代典籍时代特征视角下的机器翻译研究

A Study on Machine Translation of Ancient Chinese Books from the Perspective of Temporal Characteristics

吴梦成 1林立涛 2胡蝶 1刘畅 1黄水清 1孟凯 3王东波1
扫码查看

作者信息

  • 1. 南京农业大学信息管理学院
  • 2. 南京大学信息管理学院
  • 3. 南京农业大学马克思主义学院
  • 折叠

摘要

中国存世典籍成书于不同时代,典籍文本的语体风格及内容均具有时代性.文章以古代汉语到现代汉语的机器翻译为切入点,探究典籍文本的时代特征及其对中国古代典籍机器翻译的影响,提出针对不同历史时期训练翻译模型的策略,以提高古文翻译质量.以《二十四史全译》为研究语料,将语料划分为远古、中古、近古三个时期,从计算人文视角利用统计计量的方法对不同历史时期典籍文本的词频、词性、依存关系进行比较分析;在数据增强的基础上,利用每个时期的语料分别训练多种机器翻译模型并比较翻译效果.研究发现:典籍文本存在时代特征差异,并会对机器翻译效果产生显著影响;针对不同时期典籍文本分别训练机器翻译模型,能够提高古文翻译的准确性和流畅性.

Abstract

The language style and content of the surviving Chinese historical classics are characteristic of their time.Taking the translation from ancient Chinese to modern Chinese as the starting point,this article explores the temporal characteristics of ancient Chinese books and their influence on machine translation of Chinese historical classics,and proposes strategies for training translation models tailored to different historical periods,with the aim of improving the quality of translation of ancient texts.The article takes A Complete Translation of Twenty-Four Histories as the research corpus,which has been divided into three periods,namely the ancient,the medieval,and the near ancient.From the perspective of computational humanities,it conducts a comparative analysis of word frequency,lexicality,and dependency relationships in classical texts of different historical periods,using the statistical methods.Based on data augmentation,the corpora in each period are used to train different machine translation models respectively and to compare the translation effects.This study shows that there are differences in the temporal characteristics of ancient classics,which have a significant impact on the machine translation effect.Training machine translation models separately for classics from different periods can improve the accuracy and fluency of ancient text translation.

关键词

计算人文/二十四史/典籍时代特征/数据增强/机器翻译

Key words

computational humanities/Twenty-four Histories/historical characteristics of classics/data augmentation/machine translation

引用本文复制引用

出版年

2024
图书馆论坛
广东省立中山图书馆

图书馆论坛

CSTPCDCSSCICHSSCD北大核心
影响因子:1.864
ISSN:1002-1167
段落导航相关论文