中医古籍智能机器翻译模型构建研究

Study on the Construction of Intelligent Machine Translation Model for TCM Ancient Books

宋熹玥 ¹周净 ¹刘伟¹

扫码查看

作者信息

1. 湖南中医药大学信息科学与工程学院,湖南长沙 410208
折叠

摘要

目的构建科学规范的中医古籍智能机器翻译模型,将古籍精准地翻译为中文或英文,为临床医学学习及中医传播提供参考.方法首先,针对中医古籍机器翻译进行研究,初期实验构建句子级别的平行语料数据集,包括969 754组平行句子对;其次建立建注意力机制Seq2Seq模型(Seq2Seq+Attention),使用Seq2Seq预训练模型(Pre-Training+Seq2Seq)对80万首古诗词进行训练;最后,在构建的数据集上进行实验,利用BLEU1、BLEU2和F1作为评价指标来验证模型有效性及进一步优化的可行性.结果构建的Pre-Training+Seq2Seq模型F1值达到65.72%.结论 Pre-Training+Seq2Seq模型效果好,为中医古籍智能机器翻译提供思路.

Abstract

Objective To construct a scientific and standardized intelligent machine translation model for TCM ancient books;To accurately translate ancient books into Chinese or even English;To provide reference for clinical medical learning and TCM dissemination.Methods Firstly,machine translation of TCM ancient books was studied,and the initial experiments were conducted to construct a parallel corpus dataset at the sentence level,including 969,754 parallel sentence pairs;secondly,the Seq2Seq model of the attention mechanism(Seq2Seq+Attention)was created,and the Seq2Seq pretraining model(Pre-Training+Seq2Seq)was used to train 800,000 ancient poems;lastly,the Seq2Seq model was constructed to train 800 000 ancient poems;finally,experiments were conducted on the constructed dataset,and BLEU1,BLEU2 and F1 were used as evaluation indexes to verify the effectiveness of the model and the feasibility of further optimization.Results The F1 value of the experimentally constructed Pre-Training+Seq2Seq model reached 65.72%.Conclusion The Pre-Training+Seq2Seq model is effective and provides ideas for intelligent machine translation of TCM ancient books.

关键词

中医古籍/文言文/语料库/文本对齐/机器翻译

Key words

TCM ancient books/literary text/corpus/text alignment/machine translation

引用本文复制引用

出版年

2024

中国中医药图书情报杂志

中国中医科学院中医药信息研究所

中国中医药图书情报杂志

影响因子：0.556

ISSN：2095-5707

段落导航